You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Marco Ciaramella <ci...@gmail.com> on 2010/10/14 00:19:15 UTC

What is the maximum number of documents that can be indexed ?

Hi all,
I am working on a performance specification document on a Solr/Lucene-based
application; this document is intended for the final customer. My question
is: what is the maximum number of document I can index assuming 10 or
20kbytes for each document?

I could not find a precise answer to this question, and I tend to consider
that Solr index can be virtually limited only by the JVM, the Operating
System (limits to large file support), or by hardware constraints (mainly
RAM, etc. ... ).

Thanks
Marco

Re: What is the maximum number of documents that can be indexed ?

Posted by Allistair Crossley <al...@roxxor.co.uk>.
me also. great book, just wanted a bit more on complex DIH :)

On Oct 14, 2010, at 10:38 AM, Jason Brown wrote:

> Not related to the opening thread - but wante to thank Eric for his book. Clarified a lot of stuff and very useful.
> 
> 
> -----Original Message-----
> From: Eric Pugh [mailto:epugh@opensourceconnections.com]
> Sent: Thu 14/10/2010 15:34
> To: solr-user@lucene.apache.org
> Subject: Re: What is the maximum number of documents that can be indexed ?
> 
> I would recommend looking at the work the HathiTrust has done.  They have published some really great blog articles about the work they have done in scaling Solr, and have put in huge amounts of data.   
> 
> The good news is that there isn't a exact number, because "It depends".   The bad news is that there isn't an exact number because "it depends"!
> 
> Eric
> 
> 
> 
> On Oct 13, 2010, at 8:58 PM, Otis Gospodnetic wrote:
> 
>> Marco (use solr-user@lucene list to follow up, please),
>> 
>> There are no precise answers to such questions.  Solr can keep indexing.  The 
>> limit is, I think, the available disk space.  I've never pushed Solr or Lucene 
>> to the point where Lucene index segments would become a serious pain, but even 
>> that can be controlled.  Same thing with number of open files, large file 
>> support, etc.
>> 
>> Otis
>> ----
>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>> Lucene ecosystem search :: http://search-lucene.com/
>> 
>> 
>>> 
>>> From: Marco Ciaramella <ci...@gmail.com>
>>> To: dev@lucene.apache.org
>>> Sent: Wed, October 13, 2010 6:19:15 PM
>>> Subject: What is the maximum number of documents that can be indexed ?
>>> 
>>> Hi all,
>>> I am working on a performance specification document on a Solr/Lucene-based 
>>> application; this document is intended for the final customer. My question is: 
>>> what is the maximum number of document I can index assuming 10 or 20kbytes for 
>>> each document? 
>>> 
>>> 
>>> I could not find a precise answer to this question, and I tend to consider that 
>>> Solr index can be virtually limited only by the JVM, the Operating System 
>>> (limits to large file support), or by hardware constraints (mainly RAM, etc. ... 
>>> ). 
>>> 
>>> 
>>> Thanks
>>> Marco
>>> 
>>> 
>>> 
> 
> -----------------------------------------------------
> Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com
> Co-Author: Solr 1.4 Enterprise Search Server available from http://www.packtpub.com/solr-1-4-enterprise-search-server
> Free/Busy: http://tinyurl.com/eric-cal
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> If you wish to view the St. James's Place email disclaimer, please use the link below
> 
> http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer


RE: What is the maximum number of documents that can be indexed ?

Posted by Jason Brown <Ja...@sjp.co.uk>.
Not related to the opening thread - but wante to thank Eric for his book. Clarified a lot of stuff and very useful.


-----Original Message-----
From: Eric Pugh [mailto:epugh@opensourceconnections.com]
Sent: Thu 14/10/2010 15:34
To: solr-user@lucene.apache.org
Subject: Re: What is the maximum number of documents that can be indexed ?
 
I would recommend looking at the work the HathiTrust has done.  They have published some really great blog articles about the work they have done in scaling Solr, and have put in huge amounts of data.   

The good news is that there isn't a exact number, because "It depends".   The bad news is that there isn't an exact number because "it depends"!

Eric



On Oct 13, 2010, at 8:58 PM, Otis Gospodnetic wrote:

> Marco (use solr-user@lucene list to follow up, please),
> 
> There are no precise answers to such questions.  Solr can keep indexing.  The 
> limit is, I think, the available disk space.  I've never pushed Solr or Lucene 
> to the point where Lucene index segments would become a serious pain, but even 
> that can be controlled.  Same thing with number of open files, large file 
> support, etc.
> 
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
> 
> 
>> 
>> From: Marco Ciaramella <ci...@gmail.com>
>> To: dev@lucene.apache.org
>> Sent: Wed, October 13, 2010 6:19:15 PM
>> Subject: What is the maximum number of documents that can be indexed ?
>> 
>> Hi all,
>> I am working on a performance specification document on a Solr/Lucene-based 
>> application; this document is intended for the final customer. My question is: 
>> what is the maximum number of document I can index assuming 10 or 20kbytes for 
>> each document? 
>> 
>> 
>> I could not find a precise answer to this question, and I tend to consider that 
>> Solr index can be virtually limited only by the JVM, the Operating System 
>> (limits to large file support), or by hardware constraints (mainly RAM, etc. ... 
>> ). 
>> 
>> 
>> Thanks
>> Marco
>> 
>> 
>> 

-----------------------------------------------------
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com
Co-Author: Solr 1.4 Enterprise Search Server available from http://www.packtpub.com/solr-1-4-enterprise-search-server
Free/Busy: http://tinyurl.com/eric-cal










If you wish to view the St. James's Place email disclaimer, please use the link below

http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer

Re: What is the maximum number of documents that can be indexed ?

Posted by Eric Pugh <ep...@opensourceconnections.com>.
I would recommend looking at the work the HathiTrust has done.  They have published some really great blog articles about the work they have done in scaling Solr, and have put in huge amounts of data.   

The good news is that there isn't a exact number, because "It depends".   The bad news is that there isn't an exact number because "it depends"!

Eric



On Oct 13, 2010, at 8:58 PM, Otis Gospodnetic wrote:

> Marco (use solr-user@lucene list to follow up, please),
> 
> There are no precise answers to such questions.  Solr can keep indexing.  The 
> limit is, I think, the available disk space.  I've never pushed Solr or Lucene 
> to the point where Lucene index segments would become a serious pain, but even 
> that can be controlled.  Same thing with number of open files, large file 
> support, etc.
> 
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
> 
> 
>> 
>> From: Marco Ciaramella <ci...@gmail.com>
>> To: dev@lucene.apache.org
>> Sent: Wed, October 13, 2010 6:19:15 PM
>> Subject: What is the maximum number of documents that can be indexed ?
>> 
>> Hi all,
>> I am working on a performance specification document on a Solr/Lucene-based 
>> application; this document is intended for the final customer. My question is: 
>> what is the maximum number of document I can index assuming 10 or 20kbytes for 
>> each document? 
>> 
>> 
>> I could not find a precise answer to this question, and I tend to consider that 
>> Solr index can be virtually limited only by the JVM, the Operating System 
>> (limits to large file support), or by hardware constraints (mainly RAM, etc. ... 
>> ). 
>> 
>> 
>> Thanks
>> Marco
>> 
>> 
>> 

-----------------------------------------------------
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com
Co-Author: Solr 1.4 Enterprise Search Server available from http://www.packtpub.com/solr-1-4-enterprise-search-server
Free/Busy: http://tinyurl.com/eric-cal









Re: What is the maximum number of documents that can be indexed ?

Posted by "scott chu (朱炎詹)" <sc...@udngroup.com>.
Solr is designed in scalable aritechture. So you question depends on how 
many resources (cpu, memory, space, etc.) you have to scale Solr how high 
(within a single machine), how wide (how fast a request you wish to response 
to user using replication), and how deep (how many slices/partition (Solr's 
designers call them 'shard') you wish to split for a huge size index, if 
any.

In my experience, to analyaze the data itself & collect usage/guess possible 
usage of  your users is more important. I can share one of my experience 
with you. Initially we try to build 1 Solr Architecture to service all the 
search requests from our users. Then we re-analyze the past usage of search 
of our users, we found over 65% of all searches concentrates on recent 
1-week news. So we split to 2 Solr with 2 physical architecture. 70% of the 
machines are invested on 1st Solr to service recent 1-week news searches.

Although I believe there must be a technicall maximum parameters there, I 
don't think we'll reach that limit in normal usage.

----- Original Message ----- 
From: "Otis Gospodnetic" <ot...@yahoo.com>
To: <de...@lucene.apache.org>
Cc: <so...@lucene.apache.org>
Sent: Thursday, October 14, 2010 8:58 AM
Subject: Re: What is the maximum number of documents that can be indexed ?


> Marco (use solr-user@lucene list to follow up, please),
>
> There are no precise answers to such questions.  Solr can keep indexing. 
> The
> limit is, I think, the available disk space.  I've never pushed Solr or 
> Lucene
> to the point where Lucene index segments would become a serious pain, but 
> even
> that can be controlled.  Same thing with number of open files, large file
> support, etc.
>
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
>>
>>From: Marco Ciaramella <ci...@gmail.com>
>>To: dev@lucene.apache.org
>>Sent: Wed, October 13, 2010 6:19:15 PM
>>Subject: What is the maximum number of documents that can be indexed ?
>>
>>Hi all,
>>I am working on a performance specification document on a 
>>Solr/Lucene-based
>>application; this document is intended for the final customer. My question 
>>is:
>>what is the maximum number of document I can index assuming 10 or 20kbytes 
>>for
>>each document?
>>
>>
>>I could not find a precise answer to this question, and I tend to consider 
>>that
>>Solr index can be virtually limited only by the JVM, the Operating System
>>(limits to large file support), or by hardware constraints (mainly RAM, 
>>etc. ...
>>).
>>
>>
>>Thanks
>>Marco
>>
>>
>>


--------------------------------------------------------------------------------



¥¼¦b¶Ç¤J°T®§¤¤§ä¨ì¯f¬r¡C
Checked by AVG - www.avg.com
Version: 9.0.862 / Virus Database: 271.1.1/3193 - Release Date: 10/13/10 
02:37:00


Re: What is the maximum number of documents that can be indexed ?

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Marco (use solr-user@lucene list to follow up, please),

There are no precise answers to such questions.  Solr can keep indexing.  The 
limit is, I think, the available disk space.  I've never pushed Solr or Lucene 
to the point where Lucene index segments would become a serious pain, but even 
that can be controlled.  Same thing with number of open files, large file 
support, etc.

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/


>
>From: Marco Ciaramella <ci...@gmail.com>
>To: dev@lucene.apache.org
>Sent: Wed, October 13, 2010 6:19:15 PM
>Subject: What is the maximum number of documents that can be indexed ?
>
>Hi all,
>I am working on a performance specification document on a Solr/Lucene-based 
>application; this document is intended for the final customer. My question is: 
>what is the maximum number of document I can index assuming 10 or 20kbytes for 
>each document? 
>
>
>I could not find a precise answer to this question, and I tend to consider that 
>Solr index can be virtually limited only by the JVM, the Operating System 
>(limits to large file support), or by hardware constraints (mainly RAM, etc. ... 
>). 
>
>
>Thanks
>Marco
>
>
> 

Re: What is the maximum number of documents that can be indexed ?

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Marco (use solr-user@lucene list to follow up, please),

There are no precise answers to such questions.  Solr can keep indexing.  The 
limit is, I think, the available disk space.  I've never pushed Solr or Lucene 
to the point where Lucene index segments would become a serious pain, but even 
that can be controlled.  Same thing with number of open files, large file 
support, etc.

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/


>
>From: Marco Ciaramella <ci...@gmail.com>
>To: dev@lucene.apache.org
>Sent: Wed, October 13, 2010 6:19:15 PM
>Subject: What is the maximum number of documents that can be indexed ?
>
>Hi all,
>I am working on a performance specification document on a Solr/Lucene-based 
>application; this document is intended for the final customer. My question is: 
>what is the maximum number of document I can index assuming 10 or 20kbytes for 
>each document? 
>
>
>I could not find a precise answer to this question, and I tend to consider that 
>Solr index can be virtually limited only by the JVM, the Operating System 
>(limits to large file support), or by hardware constraints (mainly RAM, etc. ... 
>). 
>
>
>Thanks
>Marco
>
>
>