You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jame Vaalet <jv...@capitaliq.com> on 2011/07/04 13:51:00 UTC

what s the optimum size of SOLR indexes

Hi,

What would be the maximum size of a single SOLR index file for resulting in optimum search time ?
In case I have got to index all the documents in my repository  (which is in TB size) what would be the ideal architecture to follow , distributed SOLR ?

Regards,
JAME VAALET
Software Developer
EXT :8108
Capital IQ


Re: what s the optimum size of SOLR indexes

Posted by Mohammad Shariq <sh...@gmail.com>.
There are Solutions for Indexing huge data. e.g.  SolrCloud,
ZooKeeperIntegration, MultiCore, MultiShard.
depending on your requirement you can choose one or other.


On 4 July 2011 17:21, Jame Vaalet <jv...@capitaliq.com> wrote:

> Hi,
>
> What would be the maximum size of a single SOLR index file for resulting in
> optimum search time ?
> In case I have got to index all the documents in my repository  (which is
> in TB size) what would be the ideal architecture to follow , distributed
> SOLR ?
>
> Regards,
> JAME VAALET
> Software Developer
> EXT :8108
> Capital IQ
>
>


-- 
Thanks and Regards
Mohammad Shariq

RE: what s the optimum size of SOLR indexes

Posted by "Burton-West, Tom" <tb...@umich.edu>.
Hello,

On Mon, 2011-07-04 at 13:51 +0200, Jame Vaalet wrote:
> What would be the maximum size of a single SOLR index file for resulting in optimum search time ?

How do you define optimimum?   Do you want the fastest possible response time at any cost or do you have a specific response time goal? 

Can you give us more details on your use case?   What kind of load are you expecting?  What kind of queries do you need to support?
Some of the trade-offs depend if you are CPU bound or I/O bound.

Assuming a fairly large index, if you *absolutely need* the fastest possible search response time and you can *afford the hardware*, you probably want to shard your index and size your indexes so they can all fit in memory (and do some work to make sure the index data is always in memory).  If you can't afford that much memory, but still need very fast response times, you might want to size your indexes so they all fit on SSD's.  As an example of a use case on the opposite side of the spectrum, here at HathiTrust, we have a very low number of queries per second and we are running an index that totals 6 TB in size with shards of about 500GB and average response times of 200ms (but 99th percentile times of about 2 seconds).

Tom Burton-West
http://www.hathitrust.org/blogs/large-scale-search


Re: what s the optimum size of SOLR indexes

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
On Mon, 2011-07-04 at 13:51 +0200, Jame Vaalet wrote:
> What would be the maximum size of a single SOLR index file for resulting in optimum search time ?

There is no clear answer. It depends on the number of (unique) terms,
number of documents, bytes on storage, storage speed, query complexity,
faceting, number of concurrent users and a lot of other factors.

> In case I have got to index all the documents in my repository  (which is in TB size) what would be the ideal architecture to follow , distributed SOLR ?

A TB in source documents might very well end up as a simple, single
machine index of 100GB or less. It depends on the amount of search
relevant information in the documents, rather that their size in bytes.

If your sources are Word-documents or a similar format with a relatively
large amount of stuffing and your searches are mostly simple "the user
enters 2-5 verbs and hits enter", my guess is that you don't need to
worry about distribution yet.

Make a pilot. Most of the work you'll have to do for a single machine
test can be reused for a distributed production setup.


Re: what s the optimum size of SOLR indexes

Posted by arian487 <ak...@tagged.com>.
It depends on how many queries you'd be making per second.  I know for us, I
have a gradient of index sizes.  The first machine, which gets hit most
often is about 2.5 gigs.  Most of the queries would only ever need to hit
this index but then I have a bigger indices of about 5-10 gigs each which
are slower, but don't get queried as often so I can afford them to be a
little slower (and hence the bigger index)

--
View this message in context: http://lucene.472066.n3.nabble.com/what-s-the-optimum-size-of-SOLR-indexes-tp3137314p3142309.html
Sent from the Solr - User mailing list archive at Nabble.com.