You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Thomas Dowling <td...@ohiolink.edu> on 2009/01/21 17:55:41 UTC

Sizing a Linux box for Solr?

Is there a useful guide somewhere that suggests system configurations
for machines that will support multiple large-ish Solr indexes?  I'm
working on a group of library databases (journal article citations +
abstracts, mostly), and need to provide some sort of helpful information
to our hardware people.  Other than "lots", is there an answer for "We
have X millions of records, of Y average size, with Z peak simultaneous
users, so the memory needed for reasonable search performance is _____"?
 Or is the limiting factor on search performance going to be something else?

[Standard caveat: I did try checking the solr-user archives, but was
hampered by the fact that there's no search function.  The cobbler's
children go barefoot.]


-- 
Thomas Dowling
Ohio Library and Information Network
tdowling@ohiolink.edu

Re: Sizing a Linux box for Solr?

Posted by Alexander Ramos Jardim <al...@gmail.com>.
Definitely you will want to have more than one box for your index.

You can take a look at distributed search and multicore ate the wiki.


2009/1/21 Thomas Dowling <td...@ohiolink.edu>

> On 01/21/2009 12:25 PM, Matthew Runo wrote:
> > At a certain level it will become better to have multiple smaller boxes
> > rather than one huge one. I've found that even an old P4 with 2 gigs of
> > ram has decent response time on our 150,000 item index with only a few
> > users - but it quickly goes downhill if we get more than 5 or 6. How
> > many documents are you going to be storing in your index? How much of
> > them will be "stored" versus "indexed"? Will you be faceting on the
> > results?
>
> Thanks for the tip on multiple boxes.  We'll be hosting about 20
> databases total.  A couple of them are in the 10- to 20-million record
> range and a couple more are in the 5- to 10-million range.  It's highly
> structured data and I anticipate a lot of faceting and indexing almost
> all the fields.
>
> >
> > In general, I'd recommend a 64 bit processor with enough ram to store
> > your index in ram - but that might not be possible with "millions" of
> > records. Our 150,000 item index is about a gig and a half when optimized
> > but yours will likely be different depending on how much you store.
> > Faceting takes more memory than pure searching as well.
> >
>
> This is very helpful.  Thanks again.
>
>
> --
> Thomas Dowling
>



-- 
Alexander Ramos Jardim

Re: Sizing a Linux box for Solr?

Posted by Erick Erickson <er...@gmail.com>.
One other useful piece of information would be how big you
expect your indexes to be. Which you should be able to estimate
quite easily by indexing, say, 20,000 documents from the
relevant databases.

Of particular interest will be the delta between the size of the
index at, say, 10,000 documents and 20,000, since size is
related to the number of unique terms per field and once you
get past a certain number of terms, virtually every new term will
already be in your index.

Also, I think that the relevant metric is what the size is for *unstored*
data since storing the fields isn't particularly relevant to search
response time (although it can *certainly* be relevant to
*total* time if you assemble a lot of stored fields to return).
*
*If your new to Lucene, the difference between stored and
indexed is a bit confusing, so if the above is gibberish, you'd
be well served by understanding the distinction before you go
too far <G>.

Best
Erick

On Wed, Jan 21, 2009 at 1:04 PM, Thomas Dowling <td...@ohiolink.edu>wrote:

> On 01/21/2009 12:25 PM, Matthew Runo wrote:
> > At a certain level it will become better to have multiple smaller boxes
> > rather than one huge one. I've found that even an old P4 with 2 gigs of
> > ram has decent response time on our 150,000 item index with only a few
> > users - but it quickly goes downhill if we get more than 5 or 6. How
> > many documents are you going to be storing in your index? How much of
> > them will be "stored" versus "indexed"? Will you be faceting on the
> > results?
>
> Thanks for the tip on multiple boxes.  We'll be hosting about 20
> databases total.  A couple of them are in the 10- to 20-million record
> range and a couple more are in the 5- to 10-million range.  It's highly
> structured data and I anticipate a lot of faceting and indexing almost
> all the fields.
>
> >
> > In general, I'd recommend a 64 bit processor with enough ram to store
> > your index in ram - but that might not be possible with "millions" of
> > records. Our 150,000 item index is about a gig and a half when optimized
> > but yours will likely be different depending on how much you store.
> > Faceting takes more memory than pure searching as well.
> >
>
> This is very helpful.  Thanks again.
>
>
> --
> Thomas Dowling
>

Re: Sizing a Linux box for Solr?

Posted by Thomas Dowling <td...@ohiolink.edu>.
On 01/21/2009 12:25 PM, Matthew Runo wrote:
> At a certain level it will become better to have multiple smaller boxes
> rather than one huge one. I've found that even an old P4 with 2 gigs of
> ram has decent response time on our 150,000 item index with only a few
> users - but it quickly goes downhill if we get more than 5 or 6. How
> many documents are you going to be storing in your index? How much of
> them will be "stored" versus "indexed"? Will you be faceting on the
> results?

Thanks for the tip on multiple boxes.  We'll be hosting about 20
databases total.  A couple of them are in the 10- to 20-million record
range and a couple more are in the 5- to 10-million range.  It's highly
structured data and I anticipate a lot of faceting and indexing almost
all the fields.

> 
> In general, I'd recommend a 64 bit processor with enough ram to store
> your index in ram - but that might not be possible with "millions" of
> records. Our 150,000 item index is about a gig and a half when optimized
> but yours will likely be different depending on how much you store.
> Faceting takes more memory than pure searching as well.
> 

This is very helpful.  Thanks again.


-- 
Thomas Dowling

Re: Sizing a Linux box for Solr?

Posted by Matthew Runo <mr...@zappos.com>.
At a certain level it will become better to have multiple smaller  
boxes rather than one huge one. I've found that even an old P4 with 2  
gigs of ram has decent response time on our 150,000 item index with  
only a few users - but it quickly goes downhill if we get more than 5  
or 6. How many documents are you going to be storing in your index?  
How much of them will be "stored" versus "indexed"? Will you be  
faceting on the results?

In general, I'd recommend a 64 bit processor with enough ram to store  
your index in ram - but that might not be possible with "millions" of  
records. Our 150,000 item index is about a gig and a half when  
optimized but yours will likely be different depending on how much you  
store. Faceting takes more memory than pure searching as well.

I'm sure that we could work out some better suggestions with more  
information about your use case.

http://www.nabble.com/Solr---User-f14480.html is a great place to go  
for searching the solr user list.

-Matthew

On Jan 21, 2009, at 8:55 AM, Thomas Dowling wrote:

> Is there a useful guide somewhere that suggests system configurations
> for machines that will support multiple large-ish Solr indexes?  I'm
> working on a group of library databases (journal article citations +
> abstracts, mostly), and need to provide some sort of helpful  
> information
> to our hardware people.  Other than "lots", is there an answer for "We
> have X millions of records, of Y average size, with Z peak  
> simultaneous
> users, so the memory needed for reasonable search performance is  
> _____"?
> Or is the limiting factor on search performance going to be  
> something else?
>
> [Standard caveat: I did try checking the solr-user archives, but was
> hampered by the fact that there's no search function.  The cobbler's
> children go barefoot.]
>
>
> -- 
> Thomas Dowling
> Ohio Library and Information Network
> tdowling@ohiolink.edu
>