You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Michael Brandt <mi...@colorado.edu> on 2012/08/29 23:17:59 UTC

Maximum index size on single instance of Solr

Hi all,

I am looking for information on how many documents may be indexed by a
single instance of Solr (not using shards) before performance issues are
encountered. In searching the internet I've come across some varying
answers; one answer suggest 50GBs is
problematic<http://lucene.472066.n3.nabble.com/Can-Apache-Solr-Handle-TeraByte-Large-Data-tp3656484p3656848.html>;
this blog post<http://harish11g.blogspot.com/2012/02/apache-solr-sharding-amazon-ec2.html>on
sharding Solr in AWS says sharding is not necessary until you have
"millions of records," but is no more specific.

What experiences have you had with this? At what point did you find it
necessary to scale up Solr, in terms of both number of records and size of
index (whether MB, GB, etc.)?

Thanks,
Michael Brandt

Re: Maximum index size on single instance of Solr

Posted by Erick Erickson <er...@gmail.com>.
Here's a blog outlining why this is so hard to answer:
http://searchhub.org/dev/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

Just one example from your post, you mention index size as
a metric. It's often useless. Stored data ('stored="true" ') is placed
in files with special extensions (*.fdt and *.fdx). These have
virtually no effect on search requirements. They can occupy
10% of your on-disk space or 90% of your disk space.....

Gotta prototype and measure....

Best
Erick

On Wed, Aug 29, 2012 at 5:45 PM, Michael Della Bitta
<mi...@appinions.com> wrote:
> Unfortunately the answer for this can vary quite a bit based on a
> number of factors:
>
> 1. Whether or not fields are stored,
> 2. Document size,
> 3. Total term count,
> 4. Solr version
>
> etc.
>
> We have two major indexes, one for servicing online queries, and one
> for batch processing. Our batch index is performance critical and
> therefore was optimized for throughput, was stored in RAM, and has
> less stored fields than the online query one. The batch index shards
> are 25Gb or less, and we're trending toward smaller and more numerous
> shards. This is with 1.4, and I'm just finishing up on our migration
> to 3.6.1.
>
> Michael Della Bitta
>
> P.S. Why'd you CC honeybadger? Honeybadger don't care...
>
> ------------------------------------------------
> Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
> www.appinions.com
> Where Influence Isn’t a Game
>
>
> On Wed, Aug 29, 2012 at 5:17 PM, Michael Brandt
> <mi...@colorado.edu> wrote:
>> Hi all,
>>
>> I am looking for information on how many documents may be indexed by a
>> single instance of Solr (not using shards) before performance issues are
>> encountered. In searching the internet I've come across some varying
>> answers; one answer suggest 50GBs is
>> problematic<http://lucene.472066.n3.nabble.com/Can-Apache-Solr-Handle-TeraByte-Large-Data-tp3656484p3656848.html>;
>> this blog post<http://harish11g.blogspot.com/2012/02/apache-solr-sharding-amazon-ec2.html>on
>> sharding Solr in AWS says sharding is not necessary until you have
>> "millions of records," but is no more specific.
>>
>> What experiences have you had with this? At what point did you find it
>> necessary to scale up Solr, in terms of both number of records and size of
>> index (whether MB, GB, etc.)?
>>
>> Thanks,
>> Michael Brandt

Re: Maximum index size on single instance of Solr

Posted by Michael Della Bitta <mi...@appinions.com>.
Unfortunately the answer for this can vary quite a bit based on a
number of factors:

1. Whether or not fields are stored,
2. Document size,
3. Total term count,
4. Solr version

etc.

We have two major indexes, one for servicing online queries, and one
for batch processing. Our batch index is performance critical and
therefore was optimized for throughput, was stored in RAM, and has
less stored fields than the online query one. The batch index shards
are 25Gb or less, and we're trending toward smaller and more numerous
shards. This is with 1.4, and I'm just finishing up on our migration
to 3.6.1.

Michael Della Bitta

P.S. Why'd you CC honeybadger? Honeybadger don't care...

------------------------------------------------
Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
www.appinions.com
Where Influence Isn’t a Game


On Wed, Aug 29, 2012 at 5:17 PM, Michael Brandt
<mi...@colorado.edu> wrote:
> Hi all,
>
> I am looking for information on how many documents may be indexed by a
> single instance of Solr (not using shards) before performance issues are
> encountered. In searching the internet I've come across some varying
> answers; one answer suggest 50GBs is
> problematic<http://lucene.472066.n3.nabble.com/Can-Apache-Solr-Handle-TeraByte-Large-Data-tp3656484p3656848.html>;
> this blog post<http://harish11g.blogspot.com/2012/02/apache-solr-sharding-amazon-ec2.html>on
> sharding Solr in AWS says sharding is not necessary until you have
> "millions of records," but is no more specific.
>
> What experiences have you had with this? At what point did you find it
> necessary to scale up Solr, in terms of both number of records and size of
> index (whether MB, GB, etc.)?
>
> Thanks,
> Michael Brandt

Re: Maximum index size on single instance of Solr

Posted by Michael Brandt <mi...@gmail.com>.
Thanks everyone!

On Thu, Aug 30, 2012 at 11:11 AM, pravesh <su...@yahoo.com> wrote:

> We have a 48GB index size on a single shard. 20+ million documents.
> Recently
> migrated to SOLR 3.5
> But we have a cluster of SOLR servers for hosting searches. But i do see to
> migrate to SOLR sharding going forward.
>
>
> Thanx
> Pravesh
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Maximum-index-size-on-single-instance-of-Solr-tp4004171p4004418.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Maximum index size on single instance of Solr

Posted by pravesh <su...@yahoo.com>.
We have a 48GB index size on a single shard. 20+ million documents. Recently
migrated to SOLR 3.5
But we have a cluster of SOLR servers for hosting searches. But i do see to
migrate to SOLR sharding going forward.


Thanx
Pravesh




--
View this message in context: http://lucene.472066.n3.nabble.com/Maximum-index-size-on-single-instance-of-Solr-tp4004171p4004418.html
Sent from the Solr - User mailing list archive at Nabble.com.