You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Amit Nithian <an...@gmail.com> on 2010/08/31 01:52:28 UTC

Hardware Specs Question

Hi all,

I am curious to know get some opinions on at what point having more CPU
cores shows diminishing returns in terms of QPS. Our index size is about 8GB
and we have 16GB of RAM on a quad core 4 x 2.4 GHz AMD Opteron 2216.
Currently I have the heap to 8GB.

We are looking to get more servers to increase capacity and because the
warranty is set to expire on our old servers and so I was curious before
asking for a certain spec what others run and at what point does having more
cores cease to matter? Mainly looking at somewhere between 4-12 cores per
server.

Thanks!
Amit

Re: Hardware Specs Question

Posted by Shawn Heisey <so...@elyograg.org>.
  On 9/3/2010 3:39 AM, Toke Eskildsen wrote:
> I'll have to extrapolate a lot here (also known as guessing).
> You don't mention what kind of harddrives you're using, so let's say
> 15.000 RPM to err on the high-end side. Compared to the 2 drives @
> 15.000 RPM in RAID 1 we've experimented with, the difference is that the
> striping allows for concurrency when the different reads are on
> different physical drives (sorry if this is basic, I'm just trying to
> establish a common understanding here).
>
> The chance for 2 concurrent reads to be on different drives with 3
> harddrives is 5/6, the chance for 3 concurrent reads is 1/6 and the
> chance for 3 concurrent reads to be on at least 2 drives is 5/6. For the
> sake of argument, let's say that the 3 * striping gives us double the
> concurrency I/O.

I actually didn't know that there were 15,000 RPM SATA drives until just 
now when I googled.  I knew that Western Digital made some 10,000 RPM, 
but most SATA drives are 7200.  Dell doesn't sell any SATA drives faster 
than 7200, and the 500GB drives in my servers are 7200.  I'm using the 
maximum 1MB stripe size to increase the likelihood of concurrent reads.  
Our query rate is quite low (less than 1 per second), so any concurrency 
that's achieved will be limited to possibly allowing all three VMs on 
the server to access the disk at the exact same time.  With three 
stripes and two copies of each of those stripes, the chance of that is 
fair to good.

So with all that, I probably only see around a third (and possibly maybe 
up to half) the performance of SSDs.  Thanks!


Re: Hardware Specs Question

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
On Fri, 2010-09-03 at 03:45 +0200, Shawn Heisey wrote:
> On 9/2/2010 2:54 AM, Toke Eskildsen wrote:
> > We've done a fair amount of experimentation in this area (1997-era SSDs
> > vs. two 15.000 RPM harddisks in RAID 1 vs. two 10.000 RPM harddisks in
> > RAID 0). The harddisk setups never stood a chance for searching. With
> > current SSD's being faster than harddisks for writes too, they'll also
> > be better for index building, although not as impressive as for
> > searches. Old notes at http://wiki.statsbiblioteket.dk/summa/Hardware
> 
> How does it compare to six SATA drives in a Dell hardware RAID10?  

I'll have to extrapolate a lot here (also known as guessing).

You don't mention what kind of harddrives you're using, so let's say
15.000 RPM to err on the high-end side. Compared to the 2 drives @
15.000 RPM in RAID 1 we've experimented with, the difference is that the
striping allows for concurrency when the different reads are on
different physical drives (sorry if this is basic, I'm just trying to
establish a common understanding here).

The chance for 2 concurrent reads to be on different drives with 3
harddrives is 5/6, the chance for 3 concurrent reads is 1/6 and the
chance for 3 concurrent reads to be on at least 2 drives is 5/6. For the
sake of argument, let's say that the 3 * striping gives us double the
concurrency I/O.

Taking my old measurements at face value and doubling the numbers for
the 15.000 RPM measurements, this would bring six 15.000 RPM SATA 10
drives up to a throughput that is 1/3 - 2/3 of the SSD, depending on how
we measure.


Some general observations:

With long runtimes, the throughput for harddisk rises relative to the
SSD as the disk cache gets warmed. If there is frequent index updates
with deletions, the SSD gains more ground as it is not nearly as
dependent on disk cache as harddisks.

With small indexes, the difference between harddisks and SSD is
relatively small as the disk cache quickly gets filled. Consequently the
difference increases for large indexes.


One point to note for RAID is that they do not improve the speed of
single searches on a single index: They do not lower the seek time for a
single small I/O request and searching on a single index is done with a
number of small successive requests. If the performance problem is long
search time, RAID does not help (but in combination with sharding or
similar it will). If the problem is the number of concurrent searches,
RAID helps.


RE: Hardware Specs Question

Posted by Dennis Gearon <ge...@sbcglobal.net>.
Very interesting stuff!

I'm pretty sure everything will be non hard disk for intense applications FRONT line use by 10 years or sooner, with hard disk as backup/boot up.

Dennis Gearon

Signature Warning
----------------
EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Mon, 9/6/10, Toke Eskildsen <te...@statsbiblioteket.dk> wrote:

> From: Toke Eskildsen <te...@statsbiblioteket.dk>
> Subject: RE: Hardware Specs Question
> To: "Dennis Gearon" <ge...@sbcglobal.net>, "solr-user@lucene.apache.org" <so...@lucene.apache.org>
> Date: Monday, September 6, 2010, 12:35 PM
> From: Dennis Gearon [gearond@sbcglobal.net]:
> > I wouldn't have thought that CPU was a big deal with
> the speed/cores of CPU's
> > continuously growing according to Moore's law and the
> change in Disk Speed
> > barely changine 50% in 15 years. Must have a lot to do
> with caching.
> 
> I am not sure I follow you? When seek times are suddenly a
> 100 times faster (slight exaggeration, but only slight) why
> wouldn't it cause the bottleneck to move? Yes, CPU's has
> increased tremendously in speed, but so has our processing
> needs. Lucene (and by extension Solr) was made with long
> seek times in mind and looking at the current marked, it
> makes sense to continue supporting this for some years. If
> the software was optimized for sub-ms seek times, it might
> lower CPU usage or at the very least lower the need for
> caching (internal as well as external).
> 
> > What size indexes are you working with?
> 
> Around 40GB for our primary index. 9 million documents,
> AFAIR.
> 
> > Are you saying you can get the whole thing in memory?
> 
> No. For that test we had to reduce the index to 14GB on our
> 24GB test machine with Lucene's RAMDirectory. In order to
> avoid the "everything is cached and thus everything is the
> same speed"-problem, we lowered the amount of available
> memory to 3GB when we measured harddisk & SSD speed
> against the 14GB index. The Cliff notes is harddisks 200 raw
> queries/second, SSDs 774 q/sec and RAM 952 q/s, but as
> always it is not so simple to extract a single number for
> performance when warm up and caching comes into play. Let me
> be quick to add that this was with Lucene + custom code, not
> with Solr.
> 
> > That would negate almost any disk benefits.
> 
> That depends very much on your setup. It takes a fair
> amount of time to copy 14GB from storage into RAM so an
> index fully in RAM would either be very static or require
> some logic to handle updates and sync data in case of
> outages. I know there's some interesting work being done
> with this, but as SSDs are a lot cheaper than RAM and
> fulfill our needs, it is not something we pursue.
> 

RE: Hardware Specs Question

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
From: Dennis Gearon [gearond@sbcglobal.net]:
> I wouldn't have thought that CPU was a big deal with the speed/cores of CPU's
> continuously growing according to Moore's law and the change in Disk Speed
> barely changine 50% in 15 years. Must have a lot to do with caching.

I am not sure I follow you? When seek times are suddenly a 100 times faster (slight exaggeration, but only slight) why wouldn't it cause the bottleneck to move? Yes, CPU's has increased tremendously in speed, but so has our processing needs. Lucene (and by extension Solr) was made with long seek times in mind and looking at the current marked, it makes sense to continue supporting this for some years. If the software was optimized for sub-ms seek times, it might lower CPU usage or at the very least lower the need for caching (internal as well as external).

> What size indexes are you working with?

Around 40GB for our primary index. 9 million documents, AFAIR.

> Are you saying you can get the whole thing in memory?

No. For that test we had to reduce the index to 14GB on our 24GB test machine with Lucene's RAMDirectory. In order to avoid the "everything is cached and thus everything is the same speed"-problem, we lowered the amount of available memory to 3GB when we measured harddisk & SSD speed against the 14GB index. The Cliff notes is harddisks 200 raw queries/second, SSDs 774 q/sec and RAM 952 q/s, but as always it is not so simple to extract a single number for performance when warm up and caching comes into play. Let me be quick to add that this was with Lucene + custom code, not with Solr.

> That would negate almost any disk benefits.

That depends very much on your setup. It takes a fair amount of time to copy 14GB from storage into RAM so an index fully in RAM would either be very static or require some logic to handle updates and sync data in case of outages. I know there's some interesting work being done with this, but as SSDs are a lot cheaper than RAM and fulfill our needs, it is not something we pursue.

Re: Hardware Specs Question

Posted by scott chu <sc...@udngroup.com>.
well balanced system
=================
Agree. Here we'll start a performance & load test this month. I've defined a 
test criteria of 'qps', 'RTpQ' & worse case according to our use case & past 
experience. Our goal is pursuing this criteria & adjust hardware & system 
configuration to find a well balanced scalable Solr aritecture.

However, the past discussion of this thread has several good suggestion for 
our test. Thanks to all who provides their experience & suggestion.

Scott

----- Original Message ----- 
From: "Toke Eskildsen" <te...@statsbiblioteket.dk>
To: <so...@lucene.apache.org>
Sent: Friday, September 03, 2010 6:43 PM
Subject: Re: Hardware Specs Question


> On Fri, 2010-09-03 at 11:07 +0200, Dennis Gearon wrote:
>> If you really want to see performance, try external DRAM disks.
>> Whew! 800X faster than a disk.
>
> As sexy as they are, the DRAM drives does not buy much more extra
> performance. At least not at the search stage. For searching, SSDs are
> not that far from holding the index fully in RAM (about 3/4 the speed in
> our tests but YMMV). The CPU is the bottleneck.
>
> That was with Lucene 2.4 so the relative numbers might have changed, but
> the old lesson still stands: A well balanced system is key.
>
> 


Re: Hardware Specs Question

Posted by Dennis Gearon <ge...@sbcglobal.net>.
I wouldn't have thought that CPU was a big deal with the speed/cores of CPU's continuously growing according to Moore's law and the change in Disk Speed barely changine 50% in 15 years. Must have a lot to do with caching.

What size indexes are you working with? Are you saying you can get the whole thing in memory? That would negate almost any disk benefits.

I'm guessing that keeping shards small enough to fit into memory must be one of the big tricks.


Dennis Gearon

Signature Warning
----------------
EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Fri, 9/3/10, Toke Eskildsen <te...@statsbiblioteket.dk> wrote:

> From: Toke Eskildsen <te...@statsbiblioteket.dk>
> Subject: Re: Hardware Specs Question
> To: "solr-user@lucene.apache.org" <so...@lucene.apache.org>
> Date: Friday, September 3, 2010, 3:43 AM
> On Fri, 2010-09-03 at 11:07 +0200,
> Dennis Gearon wrote:
> > If you really want to see performance, try external
> DRAM disks.
> > Whew! 800X faster than a disk.
> 
> As sexy as they are, the DRAM drives does not buy much more
> extra
> performance. At least not at the search stage. For
> searching, SSDs are
> not that far from holding the index fully in RAM (about 3/4
> the speed in
> our tests but YMMV). The CPU is the bottleneck.
> 
> That was with Lucene 2.4 so the relative numbers might have
> changed, but
> the old lesson still stands: A well balanced system is
> key.
> 
> 

Re: Hardware Specs Question

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
On Fri, 2010-09-03 at 11:07 +0200, Dennis Gearon wrote:
> If you really want to see performance, try external DRAM disks.
> Whew! 800X faster than a disk.

As sexy as they are, the DRAM drives does not buy much more extra
performance. At least not at the search stage. For searching, SSDs are
not that far from holding the index fully in RAM (about 3/4 the speed in
our tests but YMMV). The CPU is the bottleneck.

That was with Lucene 2.4 so the relative numbers might have changed, but
the old lesson still stands: A well balanced system is key.


Re: Hardware Specs Question

Posted by Dennis Gearon <ge...@sbcglobal.net>.
If you really want to see performance, try external DRAM disks. Whew! 800X faster than a disk.


Dennis Gearon

Signature Warning
----------------
EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Thu, 9/2/10, Shawn Heisey <so...@elyograg.org> wrote:

> From: Shawn Heisey <so...@elyograg.org>
> Subject: Re: Hardware Specs Question
> To: solr-user@lucene.apache.org
> Date: Thursday, September 2, 2010, 6:45 PM
>  On 9/2/2010 2:54 AM, Toke Eskildsen
> wrote:
> > We've done a fair amount of experimentation in this
> area (1997-era SSDs
> > vs. two 15.000 RPM harddisks in RAID 1 vs. two 10.000
> RPM harddisks in
> > RAID 0). The harddisk setups never stood a chance for
> searching. With
> > current SSD's being faster than harddisks for writes
> too, they'll also
> > be better for index building, although not as
> impressive as for
> > searches. Old notes at http://wiki.statsbiblioteket.dk/summa/Hardware
> 
> How does it compare to six SATA drives in a Dell hardware
> RAID10?  That's what my VM hosts have, which each run
> three large shards and a couple of supporting systems.
> 
> 
> 

Re: Hardware Specs Question

Posted by Shawn Heisey <so...@elyograg.org>.
  On 9/2/2010 2:54 AM, Toke Eskildsen wrote:
> We've done a fair amount of experimentation in this area (1997-era SSDs
> vs. two 15.000 RPM harddisks in RAID 1 vs. two 10.000 RPM harddisks in
> RAID 0). The harddisk setups never stood a chance for searching. With
> current SSD's being faster than harddisks for writes too, they'll also
> be better for index building, although not as impressive as for
> searches. Old notes at http://wiki.statsbiblioteket.dk/summa/Hardware

How does it compare to six SATA drives in a Dell hardware RAID10?  
That's what my VM hosts have, which each run three large shards and a 
couple of supporting systems.



Re: Hardware Specs Question

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
On Thu, 2010-09-02 at 03:37 +0200, Lance Norskog wrote:
> I don't know how much SSD disks cost, but they will certainly cure the
> disk i/o problem.

We've done a fair amount of experimentation in this area (1997-era SSDs
vs. two 15.000 RPM harddisks in RAID 1 vs. two 10.000 RPM harddisks in
RAID 0). The harddisk setups never stood a chance for searching. With
current SSD's being faster than harddisks for writes too, they'll also
be better for index building, although not as impressive as for
searches. Old notes at http://wiki.statsbiblioteket.dk/summa/Hardware

With consumer level SSD's, there is more bang-for-the-buck than RAIDing
up with high-end harddisks. They should be the first choice when IO is
an issue.


There are of course opposing views on this issue. Some people think
enterprise: Expensive and very reliable systems where consumer hardware
is a big no-no. The price point for pro SSDs might make them unfeasible
in such a setup. Other go for cheaper setups and handle the reliability
issues with redundancy. I'm firmly in the second camp, but it is
obviously not an option for all people.

A point of concern is writes. Current consumer SSDs uses wear leveling
and they can take a lot of punishment (as a rough measurement: The
amount of free space times 10.000). They might not be suitable for
holding massive databases with thousands of writes/second, but they can
surely handle the measly amount of writes required for Lucene index
updating and searching.

A long story short: Put a quality consumer SSD in each server and be
happy.


Re: Hardware Specs Question

Posted by Lance Norskog <go...@gmail.com>.
I was just reading about configuring mass computation grids: hardware
writes on 2 striped disks take 10% than writes on a single disk,
because you have to wait for the slower disk to finish. So, single
disks without RAID are faster.

I don't know how much SSD disks cost, but they will certainly cure the
disk i/o problem.

On Tue, Aug 31, 2010 at 1:35 AM, scott chu (朱炎詹) <sc...@udngroup.com> wrote:
> In our current lab project, we already built a Chinese newspaper index with
> 18 millions documents. The index size is around 51GB. So I am very concerned
> about the memory issue you guys mentioned.
>
> I also look up the Hathitrust report on SolrPerformanceData page:
> http://wiki.apache.org/solr/SolrPerformanceData. They said their main
> bottleneck is Disk-I/O even they have 10 shards spread over 4 servers.
>
> Can you guys give me some helpful suggestion about hardward spec & memory
> configuration on our project?
>
> Thanks in advance.
>
> Scott
>
> ----- Original Message ----- From: "Lance Norskog" <go...@gmail.com>
> To: <so...@lucene.apache.org>
> Sent: Tuesday, August 31, 2010 1:01 PM
> Subject: Re: Hardware Specs Question
>
>
> There are synchronization points, which become chokepoints at some
> number of cores. I don't know where they cause Lucene to top out.
> Lucene apps are generally disk-bound, not CPU-bound, but yours will
> be. There are so many variables that it's really not possible to give
> any numbers.
>
> Lance
>
> On Mon, Aug 30, 2010 at 8:34 PM, Amit Nithian <an...@gmail.com> wrote:
>>
>> Lance,
>>
>> makes sense and I have heard about the long GC times on large heaps but I
>> personally haven't experienced a slowdown but that doesn't mean anything
>> either :-). Agreed that tuning the SOLR caching is the way to go.
>>
>> I haven't followed all the solr/lucene changes but from what I remember
>> there are synchronization points that could be a bottleneck where adding
>> more cores won't help this problem? Or am I completely missing something.
>>
>> Thanks again
>> Amit
>>
>> On Mon, Aug 30, 2010 at 8:28 PM, scott chu (朱炎詹)
>> <sc...@udngroup.com>wrote:
>>
>>> I am also curious as Amit does. Can you make an example about the garbage
>>> collection problem you mentioned?
>>>
>>> ----- Original Message ----- From: "Lance Norskog" <go...@gmail.com>
>>> To: <so...@lucene.apache.org>
>>> Sent: Tuesday, August 31, 2010 9:14 AM
>>> Subject: Re: Hardware Specs Question
>>>
>>>
>>>
>>> It generally works best to tune the Solr caches and allocate enough
>>>>
>>>> RAM to run comfortably. Linux & Windows et. al. have their own cache
>>>> of disk blocks. They use very good algorithms for managing this cache.
>>>> Also, they do not make long garbage collection passes.
>>>>
>>>> On Mon, Aug 30, 2010 at 5:48 PM, Amit Nithian <an...@gmail.com>
>>>> wrote:
>>>>
>>>>> Lance,
>>>>>
>>>>> Thanks for your help. What do you mean by that the OS can keep the
>>>>> index
>>>>> in
>>>>> memory better than Solr? Do you mean that you should use another means
>>>>> to
>>>>> keep the index in memory (i.e. ramdisk)? Is there a generally accepted
>>>>> heap
>>>>> size/index size that you follow?
>>>>>
>>>>> Thanks
>>>>> Amit
>>>>>
>>>>> On Mon, Aug 30, 2010 at 5:00 PM, Lance Norskog <go...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> The price-performance knee for small servers is 32G ram, 2-6 SATA
>>>>>>
>>>>>> disks on a raid, 8/16 cores. You can buy these servers and half-fill
>>>>>> them, leaving room for expansion.
>>>>>>
>>>>>> I have not done benchmarks about the max # of processors that can be
>>>>>> kept busy during indexing or querying, and the total numbers: QPS,
>>>>>> response time averages & variability, etc.
>>>>>>
>>>>>> If your index file size is 8G, and your Java heap is 8G, you will do
>>>>>> long garbage collection cycles. The operating system is very good at
>>>>>> keeping your index in memory- better than Solr can.
>>>>>>
>>>>>> Lance
>>>>>>
>>>>>> On Mon, Aug 30, 2010 at 4:52 PM, Amit Nithian <an...@gmail.com>
>>>>>> wrote:
>>>>>> > Hi all,
>>>>>> >
>>>>>> > I am curious to know get some opinions on at what point having more
>>>>>> > >  >
>>>>>> CPU
>>>>>> > cores shows diminishing returns in terms of QPS. Our index size is >
>>>>>> about
>>>>>> 8GB
>>>>>> > and we have 16GB of RAM on a quad core 4 x 2.4 GHz AMD Opteron 2216.
>>>>>> > Currently I have the heap to 8GB.
>>>>>> >
>>>>>> > We are looking to get more servers to increase capacity and because
>>>>>> > >  >
>>>>>> the
>>>>>> > warranty is set to expire on our old servers and so I was curious >
>>>>>> before
>>>>>> > asking for a certain spec what others run and at what point does >
>>>>>> having
>>>>>> more
>>>>>> > cores cease to matter? Mainly looking at somewhere between 4-12 >
>>>>>> > cores
>>>>>> > per
>>>>>> > server.
>>>>>> >
>>>>>> > Thanks!
>>>>>> > Amit
>>>>>> >
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Lance Norskog
>>>>>> goksron@gmail.com
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Lance Norskog
>>>> goksron@gmail.com
>>>>
>>>>
>>>
>>>
>>>
>>> --------------------------------------------------------------------------------
>>>
>>>
>>>
>>> ___b___J_T_________f_r_C
>>> Checked by AVG - www.avg.com
>>> Version: 9.0.851 / Virus Database: 271.1.1/3102 - Release Date: 08/30/10
>>> 14:35:00
>>>
>>>
>>
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>
>
>
> --------------------------------------------------------------------------------
>
>
>
> ___b___J_T_________f_r_C
> Checked by AVG - www.avg.com
> Version: 9.0.851 / Virus Database: 271.1.1/3103 - Release Date: 08/31/10
> 02:34:00
>
>



-- 
Lance Norskog
goksron@gmail.com

Re: Hardware Specs Question

Posted by "scott chu (朱炎詹)" <sc...@udngroup.com>.
In our current lab project, we already built a Chinese newspaper index with 
18 millions documents. The index size is around 51GB. So I am very concerned 
about the memory issue you guys mentioned.

I also look up the Hathitrust report on SolrPerformanceData page: 
http://wiki.apache.org/solr/SolrPerformanceData. They said their main 
bottleneck is Disk-I/O even they have 10 shards spread over 4 servers.

Can you guys give me some helpful suggestion about hardward spec & memory 
configuration on our project?

Thanks in advance.

Scott

----- Original Message ----- 
From: "Lance Norskog" <go...@gmail.com>
To: <so...@lucene.apache.org>
Sent: Tuesday, August 31, 2010 1:01 PM
Subject: Re: Hardware Specs Question


There are synchronization points, which become chokepoints at some
number of cores. I don't know where they cause Lucene to top out.
Lucene apps are generally disk-bound, not CPU-bound, but yours will
be. There are so many variables that it's really not possible to give
any numbers.

Lance

On Mon, Aug 30, 2010 at 8:34 PM, Amit Nithian <an...@gmail.com> wrote:
> Lance,
>
> makes sense and I have heard about the long GC times on large heaps but I
> personally haven't experienced a slowdown but that doesn't mean anything
> either :-). Agreed that tuning the SOLR caching is the way to go.
>
> I haven't followed all the solr/lucene changes but from what I remember
> there are synchronization points that could be a bottleneck where adding
> more cores won't help this problem? Or am I completely missing something.
>
> Thanks again
> Amit
>
> On Mon, Aug 30, 2010 at 8:28 PM, scott chu (朱炎詹) 
> <sc...@udngroup.com>wrote:
>
>> I am also curious as Amit does. Can you make an example about the garbage
>> collection problem you mentioned?
>>
>> ----- Original Message ----- From: "Lance Norskog" <go...@gmail.com>
>> To: <so...@lucene.apache.org>
>> Sent: Tuesday, August 31, 2010 9:14 AM
>> Subject: Re: Hardware Specs Question
>>
>>
>>
>> It generally works best to tune the Solr caches and allocate enough
>>> RAM to run comfortably. Linux & Windows et. al. have their own cache
>>> of disk blocks. They use very good algorithms for managing this cache.
>>> Also, they do not make long garbage collection passes.
>>>
>>> On Mon, Aug 30, 2010 at 5:48 PM, Amit Nithian <an...@gmail.com> 
>>> wrote:
>>>
>>>> Lance,
>>>>
>>>> Thanks for your help. What do you mean by that the OS can keep the 
>>>> index
>>>> in
>>>> memory better than Solr? Do you mean that you should use another means 
>>>> to
>>>> keep the index in memory (i.e. ramdisk)? Is there a generally accepted
>>>> heap
>>>> size/index size that you follow?
>>>>
>>>> Thanks
>>>> Amit
>>>>
>>>> On Mon, Aug 30, 2010 at 5:00 PM, Lance Norskog <go...@gmail.com>
>>>> wrote:
>>>>
>>>> The price-performance knee for small servers is 32G ram, 2-6 SATA
>>>>> disks on a raid, 8/16 cores. You can buy these servers and half-fill
>>>>> them, leaving room for expansion.
>>>>>
>>>>> I have not done benchmarks about the max # of processors that can be
>>>>> kept busy during indexing or querying, and the total numbers: QPS,
>>>>> response time averages & variability, etc.
>>>>>
>>>>> If your index file size is 8G, and your Java heap is 8G, you will do
>>>>> long garbage collection cycles. The operating system is very good at
>>>>> keeping your index in memory- better than Solr can.
>>>>>
>>>>> Lance
>>>>>
>>>>> On Mon, Aug 30, 2010 at 4:52 PM, Amit Nithian <an...@gmail.com>
>>>>> wrote:
>>>>> > Hi all,
>>>>> >
>>>>> > I am curious to know get some opinions on at what point having more 
>>>>> >  >
>>>>> CPU
>>>>> > cores shows diminishing returns in terms of QPS. Our index size is >
>>>>> about
>>>>> 8GB
>>>>> > and we have 16GB of RAM on a quad core 4 x 2.4 GHz AMD Opteron 2216.
>>>>> > Currently I have the heap to 8GB.
>>>>> >
>>>>> > We are looking to get more servers to increase capacity and because 
>>>>> >  >
>>>>> the
>>>>> > warranty is set to expire on our old servers and so I was curious >
>>>>> before
>>>>> > asking for a certain spec what others run and at what point does >
>>>>> having
>>>>> more
>>>>> > cores cease to matter? Mainly looking at somewhere between 4-12 
>>>>> > cores
>>>>> > per
>>>>> > server.
>>>>> >
>>>>> > Thanks!
>>>>> > Amit
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Lance Norskog
>>>>> goksron@gmail.com
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Lance Norskog
>>> goksron@gmail.com
>>>
>>>
>>
>>
>> --------------------------------------------------------------------------------
>>
>>
>>
>> ___b___J_T_________f_r_C
>> Checked by AVG - www.avg.com
>> Version: 9.0.851 / Virus Database: 271.1.1/3102 - Release Date: 08/30/10
>> 14:35:00
>>
>>
>



-- 
Lance Norskog
goksron@gmail.com



--------------------------------------------------------------------------------



___b___J_T_________f_r_C
Checked by AVG - www.avg.com
Version: 9.0.851 / Virus Database: 271.1.1/3103 - Release Date: 08/31/10 
02:34:00


Re: Hardware Specs Question

Posted by Lance Norskog <go...@gmail.com>.
There are synchronization points, which become chokepoints at some
number of cores. I don't know where they cause Lucene to top out.
Lucene apps are generally disk-bound, not CPU-bound, but yours will
be. There are so many variables that it's really not possible to give
any numbers.

Lance

On Mon, Aug 30, 2010 at 8:34 PM, Amit Nithian <an...@gmail.com> wrote:
> Lance,
>
> makes sense and I have heard about the long GC times on large heaps but I
> personally haven't experienced a slowdown but that doesn't mean anything
> either :-). Agreed that tuning the SOLR caching is the way to go.
>
> I haven't followed all the solr/lucene changes but from what I remember
> there are synchronization points that could be a bottleneck where adding
> more cores won't help this problem? Or am I completely missing something.
>
> Thanks again
> Amit
>
> On Mon, Aug 30, 2010 at 8:28 PM, scott chu (朱炎詹) <sc...@udngroup.com>wrote:
>
>> I am also curious as Amit does. Can you make an example about the garbage
>> collection problem you mentioned?
>>
>> ----- Original Message ----- From: "Lance Norskog" <go...@gmail.com>
>> To: <so...@lucene.apache.org>
>> Sent: Tuesday, August 31, 2010 9:14 AM
>> Subject: Re: Hardware Specs Question
>>
>>
>>
>>  It generally works best to tune the Solr caches and allocate enough
>>> RAM to run comfortably. Linux & Windows et. al. have their own cache
>>> of disk blocks. They use very good algorithms for managing this cache.
>>> Also, they do not make long garbage collection passes.
>>>
>>> On Mon, Aug 30, 2010 at 5:48 PM, Amit Nithian <an...@gmail.com> wrote:
>>>
>>>> Lance,
>>>>
>>>> Thanks for your help. What do you mean by that the OS can keep the index
>>>> in
>>>> memory better than Solr? Do you mean that you should use another means to
>>>> keep the index in memory (i.e. ramdisk)? Is there a generally accepted
>>>> heap
>>>> size/index size that you follow?
>>>>
>>>> Thanks
>>>> Amit
>>>>
>>>> On Mon, Aug 30, 2010 at 5:00 PM, Lance Norskog <go...@gmail.com>
>>>> wrote:
>>>>
>>>>  The price-performance knee for small servers is 32G ram, 2-6 SATA
>>>>> disks on a raid, 8/16 cores. You can buy these servers and half-fill
>>>>> them, leaving room for expansion.
>>>>>
>>>>> I have not done benchmarks about the max # of processors that can be
>>>>> kept busy during indexing or querying, and the total numbers: QPS,
>>>>> response time averages & variability, etc.
>>>>>
>>>>> If your index file size is 8G, and your Java heap is 8G, you will do
>>>>> long garbage collection cycles. The operating system is very good at
>>>>> keeping your index in memory- better than Solr can.
>>>>>
>>>>> Lance
>>>>>
>>>>> On Mon, Aug 30, 2010 at 4:52 PM, Amit Nithian <an...@gmail.com>
>>>>> wrote:
>>>>> > Hi all,
>>>>> >
>>>>> > I am curious to know get some opinions on at what point having more >
>>>>> CPU
>>>>> > cores shows diminishing returns in terms of QPS. Our index size is >
>>>>> about
>>>>> 8GB
>>>>> > and we have 16GB of RAM on a quad core 4 x 2.4 GHz AMD Opteron 2216.
>>>>> > Currently I have the heap to 8GB.
>>>>> >
>>>>> > We are looking to get more servers to increase capacity and because >
>>>>> the
>>>>> > warranty is set to expire on our old servers and so I was curious >
>>>>> before
>>>>> > asking for a certain spec what others run and at what point does >
>>>>> having
>>>>> more
>>>>> > cores cease to matter? Mainly looking at somewhere between 4-12 cores
>>>>> > per
>>>>> > server.
>>>>> >
>>>>> > Thanks!
>>>>> > Amit
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Lance Norskog
>>>>> goksron@gmail.com
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Lance Norskog
>>> goksron@gmail.com
>>>
>>>
>>
>>
>> --------------------------------------------------------------------------------
>>
>>
>>
>> ___b___J_T_________f_r_C
>> Checked by AVG - www.avg.com
>> Version: 9.0.851 / Virus Database: 271.1.1/3102 - Release Date: 08/30/10
>> 14:35:00
>>
>>
>



-- 
Lance Norskog
goksron@gmail.com

Re: Hardware Specs Question

Posted by Amit Nithian <an...@gmail.com>.
Lance,

makes sense and I have heard about the long GC times on large heaps but I
personally haven't experienced a slowdown but that doesn't mean anything
either :-). Agreed that tuning the SOLR caching is the way to go.

I haven't followed all the solr/lucene changes but from what I remember
there are synchronization points that could be a bottleneck where adding
more cores won't help this problem? Or am I completely missing something.

Thanks again
Amit

On Mon, Aug 30, 2010 at 8:28 PM, scott chu (朱炎詹) <sc...@udngroup.com>wrote:

> I am also curious as Amit does. Can you make an example about the garbage
> collection problem you mentioned?
>
> ----- Original Message ----- From: "Lance Norskog" <go...@gmail.com>
> To: <so...@lucene.apache.org>
> Sent: Tuesday, August 31, 2010 9:14 AM
> Subject: Re: Hardware Specs Question
>
>
>
>  It generally works best to tune the Solr caches and allocate enough
>> RAM to run comfortably. Linux & Windows et. al. have their own cache
>> of disk blocks. They use very good algorithms for managing this cache.
>> Also, they do not make long garbage collection passes.
>>
>> On Mon, Aug 30, 2010 at 5:48 PM, Amit Nithian <an...@gmail.com> wrote:
>>
>>> Lance,
>>>
>>> Thanks for your help. What do you mean by that the OS can keep the index
>>> in
>>> memory better than Solr? Do you mean that you should use another means to
>>> keep the index in memory (i.e. ramdisk)? Is there a generally accepted
>>> heap
>>> size/index size that you follow?
>>>
>>> Thanks
>>> Amit
>>>
>>> On Mon, Aug 30, 2010 at 5:00 PM, Lance Norskog <go...@gmail.com>
>>> wrote:
>>>
>>>  The price-performance knee for small servers is 32G ram, 2-6 SATA
>>>> disks on a raid, 8/16 cores. You can buy these servers and half-fill
>>>> them, leaving room for expansion.
>>>>
>>>> I have not done benchmarks about the max # of processors that can be
>>>> kept busy during indexing or querying, and the total numbers: QPS,
>>>> response time averages & variability, etc.
>>>>
>>>> If your index file size is 8G, and your Java heap is 8G, you will do
>>>> long garbage collection cycles. The operating system is very good at
>>>> keeping your index in memory- better than Solr can.
>>>>
>>>> Lance
>>>>
>>>> On Mon, Aug 30, 2010 at 4:52 PM, Amit Nithian <an...@gmail.com>
>>>> wrote:
>>>> > Hi all,
>>>> >
>>>> > I am curious to know get some opinions on at what point having more >
>>>> CPU
>>>> > cores shows diminishing returns in terms of QPS. Our index size is >
>>>> about
>>>> 8GB
>>>> > and we have 16GB of RAM on a quad core 4 x 2.4 GHz AMD Opteron 2216.
>>>> > Currently I have the heap to 8GB.
>>>> >
>>>> > We are looking to get more servers to increase capacity and because >
>>>> the
>>>> > warranty is set to expire on our old servers and so I was curious >
>>>> before
>>>> > asking for a certain spec what others run and at what point does >
>>>> having
>>>> more
>>>> > cores cease to matter? Mainly looking at somewhere between 4-12 cores
>>>> > per
>>>> > server.
>>>> >
>>>> > Thanks!
>>>> > Amit
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Lance Norskog
>>>> goksron@gmail.com
>>>>
>>>>
>>>
>>
>>
>> --
>> Lance Norskog
>> goksron@gmail.com
>>
>>
>
>
> --------------------------------------------------------------------------------
>
>
>
> ___b___J_T_________f_r_C
> Checked by AVG - www.avg.com
> Version: 9.0.851 / Virus Database: 271.1.1/3102 - Release Date: 08/30/10
> 14:35:00
>
>

Re: Hardware Specs Question

Posted by "scott chu (朱炎詹)" <sc...@udngroup.com>.
I am also curious as Amit does. Can you make an example about the garbage 
collection problem you mentioned?

----- Original Message ----- 
From: "Lance Norskog" <go...@gmail.com>
To: <so...@lucene.apache.org>
Sent: Tuesday, August 31, 2010 9:14 AM
Subject: Re: Hardware Specs Question


> It generally works best to tune the Solr caches and allocate enough
> RAM to run comfortably. Linux & Windows et. al. have their own cache
> of disk blocks. They use very good algorithms for managing this cache.
> Also, they do not make long garbage collection passes.
>
> On Mon, Aug 30, 2010 at 5:48 PM, Amit Nithian <an...@gmail.com> wrote:
>> Lance,
>>
>> Thanks for your help. What do you mean by that the OS can keep the index 
>> in
>> memory better than Solr? Do you mean that you should use another means to
>> keep the index in memory (i.e. ramdisk)? Is there a generally accepted 
>> heap
>> size/index size that you follow?
>>
>> Thanks
>> Amit
>>
>> On Mon, Aug 30, 2010 at 5:00 PM, Lance Norskog <go...@gmail.com> wrote:
>>
>>> The price-performance knee for small servers is 32G ram, 2-6 SATA
>>> disks on a raid, 8/16 cores. You can buy these servers and half-fill
>>> them, leaving room for expansion.
>>>
>>> I have not done benchmarks about the max # of processors that can be
>>> kept busy during indexing or querying, and the total numbers: QPS,
>>> response time averages & variability, etc.
>>>
>>> If your index file size is 8G, and your Java heap is 8G, you will do
>>> long garbage collection cycles. The operating system is very good at
>>> keeping your index in memory- better than Solr can.
>>>
>>> Lance
>>>
>>> On Mon, Aug 30, 2010 at 4:52 PM, Amit Nithian <an...@gmail.com> 
>>> wrote:
>>> > Hi all,
>>> >
>>> > I am curious to know get some opinions on at what point having more 
>>> > CPU
>>> > cores shows diminishing returns in terms of QPS. Our index size is 
>>> > about
>>> 8GB
>>> > and we have 16GB of RAM on a quad core 4 x 2.4 GHz AMD Opteron 2216.
>>> > Currently I have the heap to 8GB.
>>> >
>>> > We are looking to get more servers to increase capacity and because 
>>> > the
>>> > warranty is set to expire on our old servers and so I was curious 
>>> > before
>>> > asking for a certain spec what others run and at what point does 
>>> > having
>>> more
>>> > cores cease to matter? Mainly looking at somewhere between 4-12 cores 
>>> > per
>>> > server.
>>> >
>>> > Thanks!
>>> > Amit
>>> >
>>>
>>>
>>>
>>> --
>>> Lance Norskog
>>> goksron@gmail.com
>>>
>>
>
>
>
> -- 
> Lance Norskog
> goksron@gmail.com
>


--------------------------------------------------------------------------------



___b___J_T_________f_r_C
Checked by AVG - www.avg.com
Version: 9.0.851 / Virus Database: 271.1.1/3102 - Release Date: 08/30/10 
14:35:00


Re: Hardware Specs Question

Posted by Lance Norskog <go...@gmail.com>.
It generally works best to tune the Solr caches and allocate enough
RAM to run comfortably. Linux & Windows et. al. have their own cache
of disk blocks. They use very good algorithms for managing this cache.
Also, they do not make long garbage collection passes.

On Mon, Aug 30, 2010 at 5:48 PM, Amit Nithian <an...@gmail.com> wrote:
> Lance,
>
> Thanks for your help. What do you mean by that the OS can keep the index in
> memory better than Solr? Do you mean that you should use another means to
> keep the index in memory (i.e. ramdisk)? Is there a generally accepted heap
> size/index size that you follow?
>
> Thanks
> Amit
>
> On Mon, Aug 30, 2010 at 5:00 PM, Lance Norskog <go...@gmail.com> wrote:
>
>> The price-performance knee for small servers is 32G ram, 2-6 SATA
>> disks on a raid, 8/16 cores. You can buy these servers and half-fill
>> them, leaving room for expansion.
>>
>> I have not done benchmarks about the max # of processors that can be
>> kept busy during indexing or querying, and the total numbers: QPS,
>> response time averages & variability, etc.
>>
>> If your index file size is 8G, and your Java heap is 8G, you will do
>> long garbage collection cycles. The operating system is very good at
>> keeping your index in memory- better than Solr can.
>>
>> Lance
>>
>> On Mon, Aug 30, 2010 at 4:52 PM, Amit Nithian <an...@gmail.com> wrote:
>> > Hi all,
>> >
>> > I am curious to know get some opinions on at what point having more CPU
>> > cores shows diminishing returns in terms of QPS. Our index size is about
>> 8GB
>> > and we have 16GB of RAM on a quad core 4 x 2.4 GHz AMD Opteron 2216.
>> > Currently I have the heap to 8GB.
>> >
>> > We are looking to get more servers to increase capacity and because the
>> > warranty is set to expire on our old servers and so I was curious before
>> > asking for a certain spec what others run and at what point does having
>> more
>> > cores cease to matter? Mainly looking at somewhere between 4-12 cores per
>> > server.
>> >
>> > Thanks!
>> > Amit
>> >
>>
>>
>>
>> --
>> Lance Norskog
>> goksron@gmail.com
>>
>



-- 
Lance Norskog
goksron@gmail.com

Re: Hardware Specs Question

Posted by Amit Nithian <an...@gmail.com>.
Lance,

Thanks for your help. What do you mean by that the OS can keep the index in
memory better than Solr? Do you mean that you should use another means to
keep the index in memory (i.e. ramdisk)? Is there a generally accepted heap
size/index size that you follow?

Thanks
Amit

On Mon, Aug 30, 2010 at 5:00 PM, Lance Norskog <go...@gmail.com> wrote:

> The price-performance knee for small servers is 32G ram, 2-6 SATA
> disks on a raid, 8/16 cores. You can buy these servers and half-fill
> them, leaving room for expansion.
>
> I have not done benchmarks about the max # of processors that can be
> kept busy during indexing or querying, and the total numbers: QPS,
> response time averages & variability, etc.
>
> If your index file size is 8G, and your Java heap is 8G, you will do
> long garbage collection cycles. The operating system is very good at
> keeping your index in memory- better than Solr can.
>
> Lance
>
> On Mon, Aug 30, 2010 at 4:52 PM, Amit Nithian <an...@gmail.com> wrote:
> > Hi all,
> >
> > I am curious to know get some opinions on at what point having more CPU
> > cores shows diminishing returns in terms of QPS. Our index size is about
> 8GB
> > and we have 16GB of RAM on a quad core 4 x 2.4 GHz AMD Opteron 2216.
> > Currently I have the heap to 8GB.
> >
> > We are looking to get more servers to increase capacity and because the
> > warranty is set to expire on our old servers and so I was curious before
> > asking for a certain spec what others run and at what point does having
> more
> > cores cease to matter? Mainly looking at somewhere between 4-12 cores per
> > server.
> >
> > Thanks!
> > Amit
> >
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>

Re: Hardware Specs Question

Posted by Lance Norskog <go...@gmail.com>.
The price-performance knee for small servers is 32G ram, 2-6 SATA
disks on a raid, 8/16 cores. You can buy these servers and half-fill
them, leaving room for expansion.

I have not done benchmarks about the max # of processors that can be
kept busy during indexing or querying, and the total numbers: QPS,
response time averages & variability, etc.

If your index file size is 8G, and your Java heap is 8G, you will do
long garbage collection cycles. The operating system is very good at
keeping your index in memory- better than Solr can.

Lance

On Mon, Aug 30, 2010 at 4:52 PM, Amit Nithian <an...@gmail.com> wrote:
> Hi all,
>
> I am curious to know get some opinions on at what point having more CPU
> cores shows diminishing returns in terms of QPS. Our index size is about 8GB
> and we have 16GB of RAM on a quad core 4 x 2.4 GHz AMD Opteron 2216.
> Currently I have the heap to 8GB.
>
> We are looking to get more servers to increase capacity and because the
> warranty is set to expire on our old servers and so I was curious before
> asking for a certain spec what others run and at what point does having more
> cores cease to matter? Mainly looking at somewhere between 4-12 cores per
> server.
>
> Thanks!
> Amit
>



-- 
Lance Norskog
goksron@gmail.com