You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Phil Hagelberg <ph...@hagelb.org> on 2009/11/17 02:42:49 UTC

core size

I'm are planning out a system with large indexes and wondering what kind
of performance boost I'd see if I split out documents into many cores
rather than using a single core and splitting by a field. I've got about
500GB worth of indexes ranging from 100MB to 50GB each.

I'm assuming if we split them out to multiple cores we would see the
most dramatic benefit in searches on the smaller cores, but I'm just
wondering what level of speedup I should expect. Eventually the cores
will be split up anyway, I'm just trying to determine how to prioritize
it.

thanks,
Phil

Re: core size

Posted by Lance Norskog <go...@gmail.com>.
Been there done that.

Indexing into the smaller cores will be faster.
You will be able to spread the load across multiple machines.

There are other advantages:
You will not have a 1/2Terabyte set of files to worry about.
You will not need 1.1T in one partition to run an optimize.
You will not need 12+ hours to run an optimize.
It will not take 1/2 hour to copy the newly optimized index to a query server.

On Mon, Nov 16, 2009 at 7:14 PM, Otis Gospodnetic
<ot...@yahoo.com> wrote:
> If an index fits in memory, I am guessing you'll see the speed change roughly proportionally to the size of the index.  If an index does not fit into memory (i.e. disk head has to run around the disk to look for info), then the improvement will be even greater.  I haven't explicitly tested this and am hoping somebody will correct me if this is wrong.
>
> Otis
> --
> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
>
>
>
> ----- Original Message ----
>> From: Phil Hagelberg <ph...@hagelb.org>
>> To: solr-user@lucene.apache.org
>> Sent: Mon, November 16, 2009 8:42:49 PM
>> Subject: core size
>>
>>
>> I'm are planning out a system with large indexes and wondering what kind
>> of performance boost I'd see if I split out documents into many cores
>> rather than using a single core and splitting by a field. I've got about
>> 500GB worth of indexes ranging from 100MB to 50GB each.
>>
>> I'm assuming if we split them out to multiple cores we would see the
>> most dramatic benefit in searches on the smaller cores, but I'm just
>> wondering what level of speedup I should expect. Eventually the cores
>> will be split up anyway, I'm just trying to determine how to prioritize
>> it.
>>
>> thanks,
>> Phil
>
>



-- 
Lance Norskog
goksron@gmail.com

Re: core size

Posted by Otis Gospodnetic <ot...@yahoo.com>.
If an index fits in memory, I am guessing you'll see the speed change roughly proportionally to the size of the index.  If an index does not fit into memory (i.e. disk head has to run around the disk to look for info), then the improvement will be even greater.  I haven't explicitly tested this and am hoping somebody will correct me if this is wrong.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



----- Original Message ----
> From: Phil Hagelberg <ph...@hagelb.org>
> To: solr-user@lucene.apache.org
> Sent: Mon, November 16, 2009 8:42:49 PM
> Subject: core size
> 
> 
> I'm are planning out a system with large indexes and wondering what kind
> of performance boost I'd see if I split out documents into many cores
> rather than using a single core and splitting by a field. I've got about
> 500GB worth of indexes ranging from 100MB to 50GB each.
> 
> I'm assuming if we split them out to multiple cores we would see the
> most dramatic benefit in searches on the smaller cores, but I'm just
> wondering what level of speedup I should expect. Eventually the cores
> will be split up anyway, I'm just trying to determine how to prioritize
> it.
> 
> thanks,
> Phil