You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Toke Eskildsen <te...@statsbiblioteket.dk> on 2008/03/13 12:03:44 UTC

Solid State Drives vs. RAMDirectory

Time for another dose of inspiration for investigating Solid State
Drives. And no, I don't get percentages from the chip manufacturers :-)

This time I'll argue that there's little gain in using a RAMDirectory
over SSDs, when performing searches. At least for our setting.


We've taken our production index of about 10 million documents / 37GB
and reduced it to 14GB by removing documents uniformly across the index.
A test with fairly simple searches were performed, using logged queries
from our production system (see the thread "Multiple Searchers" on this
mail list for details) and extracting the content of a stored field for
the first 20 hits for each search.

On a dual-core Xeon machine with 24GB of RAM, the full index can be
loaded into RAM with a RAMDirectory. The following is the average speed
over 340.000 queries. In the log names, t2 signifies 2 threads with a
shared searcher, t2u signifies 2 threads with separate searchers.

metis_RAM_24GB_i14_v23_t1_l23.log       530.0 q/sec
metis_RAM_24GB_i14_v23_t2_l23.log       888.2 q/sec
metis_RAM_24GB_i14_v23_t2u_l23.log      983.9 q/sec
metis_RAM_24GB_i14_v23_t3_l23.log       843.1 q/sec
metis_RAM_24GB_i14_v23_t3u_l23.log      996.1 q/sec
metis_RAM_24GB_i14_v23_t4_l23.log       869.8 q/sec
metis_RAM_24GB_i14_v23_t4u_l23.log      943.4 q/sec

As can be seen, the best performing configuration was 3 threads with
separate searchers. The time for loading the index into RAM was ignored.


Now for the interesting part: Reducing the amount of available RAM to
3GB and using SSDs instead.

metis_MTRONSSD_RAID0_3GB_i14_v23_t1_l23.log     433.7 q/sec
metis_MTRONSSD_RAID0_3GB_i14_v23_t2_l23.log     573.4 q/sec
metis_MTRONSSD_RAID0_3GB_i14_v23_t2u_l23.log    783.4 q/sec
metis_MTRONSSD_RAID0_3GB_i14_v23_t3_l23.log     459.7 q/sec
metis_MTRONSSD_RAID0_3GB_i14_v23_t3u_l23.log    808.5 q/sec
metis_MTRONSSD_RAID0_3GB_i14_v23_t4_l23.log     455.3 q/sec
metis_MTRONSSD_RAID0_3GB_i14_v23_t4u_l23.log    809.0 q/sec
metis_MTRONSSD_RAID0_3GB_i14_v23_t5_l23.log     454.4 q/sec

In comparison, the same test with 3GB of RAM on 15.000 RPM harddisks in
RAID 1 gave these numbers:

metis_15000RPM_RAID1_3GB_i14_v23_t1_l23.log     176.6 q/sec
metis_15000RPM_RAID1_3GB_i14_v23_t2_l23.log     188.6 q/sec
metis_15000RPM_RAID1_3GB_i14_v23_t2u_l23.log    247.1 q/sec
metis_15000RPM_RAID1_3GB_i14_v23_t3_l23.log     178.4 q/sec
metis_15000RPM_RAID1_3GB_i14_v23_t3u_l23.log    276.1 q/sec
metis_15000RPM_RAID1_3GB_i14_v23_t4_l23.log     177.8 q/sec
metis_15000RPM_RAID1_3GB_i14_v23_t4u_l23.log    259.3 q/sec
metis_15000RPM_RAID1_3GB_i14_v23_t5_l23.log     178.5 q/sec

SSDs does not equal RAMDirectory in speed for this setup, but 81% is not
bad, especially not when compared to the 28% for conventional harddisks.


Performing the same tests with 8GB of available RAM on the machine gave
the following results:

metis_MTRONSSD_RAID0_8GB_i14_v23_t1_l23.log     431.9 q/sec
metis_MTRONSSD_RAID0_8GB_i14_v23_t2_l23.log     594.3 q/sec
metis_MTRONSSD_RAID0_8GB_i14_v23_t2u_l23.log    807.7 q/sec
metis_MTRONSSD_RAID0_8GB_i14_v23_t3_l23.log     472.3 q/sec
metis_MTRONSSD_RAID0_8GB_i14_v23_t3u_l23.log    817.6 q/sec
metis_MTRONSSD_RAID0_8GB_i14_v23_t4_l23.log     464.4 q/sec
metis_MTRONSSD_RAID0_8GB_i14_v23_t4u_l23.log    828.8 q/sec
metis_MTRONSSD_RAID0_8GB_i14_v23_t5_l23.log     471.2 q/sec

metis_15000RPM_RAID1_8GB_i14_v23_t1_l23.log     199.4 q/sec
metis_15000RPM_RAID1_8GB_i14_v23_t2_l23.log     220.4 q/sec
metis_15000RPM_RAID1_8GB_i14_v23_t2u_l23.log    312.4 q/sec
metis_15000RPM_RAID1_8GB_i14_v23_t3_l23.log     203.8 q/sec
metis_15000RPM_RAID1_8GB_i14_v23_t3u_l23.log    370.9 q/sec
metis_15000RPM_RAID1_8GB_i14_v23_t4_l23.log     203.1 q/sec
metis_15000RPM_RAID1_8GB_i14_v23_t4u_l23.log    408.1 q/sec
metis_15000RPM_RAID1_8GB_i14_v23_t5_l23.log     202.5 q/sec

Switching to 12GB...

metis_MTRONSSD_RAID0_12GB_i14_v23_t1_l23.log    438.8 q/sec
metis_MTRONSSD_RAID0_12GB_i14_v23_t2_l23.log    587.8 q/sec
metis_MTRONSSD_RAID0_12GB_i14_v23_t2u_l23.log   819.9 q/sec
metis_MTRONSSD_RAID0_12GB_i14_v23_t3_l23.log    476.4 q/sec
metis_MTRONSSD_RAID0_12GB_i14_v23_t3u_l23.log   833.7 q/sec
metis_MTRONSSD_RAID0_12GB_i14_v23_t4_l23.log    465.4 q/sec
metis_MTRONSSD_RAID0_12GB_i14_v23_t4u_l23.log   835.2 q/sec
metis_MTRONSSD_RAID0_12GB_i14_v23_t5_l23.log    467.1 q/sec

metis_15000RPM_RAID1_12GB_i14_v23_t1_l23.log    198.6 q/sec
metis_15000RPM_RAID1_12GB_i14_v23_t2_l23.log    219.1 q/sec
metis_15000RPM_RAID1_12GB_i14_v23_t2u_l23.log   309.4 q/sec
metis_15000RPM_RAID1_12GB_i14_v23_t3_l23.log    204.1 q/sec
metis_15000RPM_RAID1_12GB_i14_v23_t3u_l23.log   362.4 q/sec
metis_15000RPM_RAID1_12GB_i14_v23_t4_l23.log    202.3 q/sec
metis_15000RPM_RAID1_12GB_i14_v23_t4u_l23.log   406.6 q/sec
metis_15000RPM_RAID1_12GB_i14_v23_t5_l23.log    201.2 q/sec


Extracting the fastest configurations for the different RAM amounts:

metis_RAM_24GB_i14_v23_t3u_l23.log      996.1 q/sec

3GB of RAM:
metis_MTRONSSD_RAID0_3GB_i14_v23_t4u_l23.log    809.0 q/sec
metis_15000RPM_RAID1_3GB_i14_v23_t3u_l23.log    276.1 q/sec

8GB of RAM:
metis_MTRONSSD_RAID0_8GB_i14_v23_t4u_l23.log    828.8 q/sec
metis_15000RPM_RAID1_8GB_i14_v23_t4u_l23.log    408.1 q/sec

12GB of RAM:
metis_MTRONSSD_RAID0_12GB_i14_v23_t4u_l23.log   835.2 q/sec
metis_15000RPM_RAID1_12GB_i14_v23_t4u_l23.log   406.6 q/sec

As can be seen, the SSDs benefit somewhat from running at 8GB, while the
harddrives benefit a lot. Plotting a graph with queries/second over time
shows clearly that the performance of the harddrives relative to the RAM
speed is steadily climbing, while the SSD speed is not (or at least very
little). This tells me that the speed of SSD-stored indexes is fairly
independent of the amount of RAM available for cache.

Upping the amount to 12GB doesn't change much. Clearly 8GB is "enough"
for our 14GB index with our queries.


With the fear of making all this unclear, let's try and ignore the first
5000 queries and cut off the statistics after 50,000 queries. This
mimics a setting with warm-up and a not-so-stale index that gets
replaced once in a while. Extracting the fastest configurations for the
different RAM amounts gives us:

RAMDirectory (24GB of RAM):
metis_RAM_24GB_i14_v23_t2u_l23.log 867.3 q/sec

3GB of RAM:
metis_MTRONSSD_RAID0_3GB_i14_v23_t3u_l23.log 663.2 q/sec
metis_15000RPM_RAID1_3GB_i14_v23_t4u_l23.log 163.4 q/sec

8GB of RAM:
metis_MTRONSSD_RAID0_8GB_i14_v23_t4u_l23.log 653.6 q/sec
metis_15000RPM_RAID1_8GB_i14_v23_t4u_l23.log 163.4 q/sec

12GB of RAM:
metis_MTRONSSD_RAID0_12GB_i14_v23_t3u_l23.log 653.6 q/sec
metis_15000RPM_RAID1_12GB_i14_v23_t4u_l23.log 163.4 q/sec

Yes, the 3*163.4 is a funny coincidence, I double-checked and looked at
the graphs: Up till about 60,000 queries, the graphs are virtually
identical for the 15000RPM, then the one for 3GB RAM stabilizes and the
ones for 8 and 12GB continue being virtually identical and climbing.
For SSDs, the graph for 3GB is a bit higher than the other ones until
about 50-60.000 queries, then a bit lower for the rest.

For this scenario, the speed of SSDs compared to RAMDirectory drops to
75-76% while the speed of harddisks drops to 19%, fairly independent of
RAM. In other words: Upping the amount of RAM does not help us when the
index is replaced before we pass the 50.000 queries.

Another observation: The faster we change our index, the better SSD
looks compared to harddisks. On the flip side - for long run-times with
unchanged index, harddisks seems the better choice, at least from an
economically point of view.


Grand conclusion? Getting 3/4 of the performance of RAMDirectory by
using SSDs on a machine with much less RAM seems like a good deal if
high performance / machine is needed.


Remember, this is all searches with an optimized index. This is on the
corpus from the Danish State and University Library and should be seen
as nothing else than inspiration.

Still pending is experiments with updating large indexes on SSDs. My
guess is that there won't be anywhere near the same speed-increase as
for the pure searches. It'll have to wait a bit though, as it requires
Real Work, as opposed to just starting a script.


NB: I'd like to post my findings on the Lucene wiki, but I have been
unable to locate the appropriate page. Could someone please point me in
the right direction?


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: [aside] Re: Solid State Drives vs. RAMDirectory

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
On Thu, 2008-03-13 at 08:37 -0400, Grant Ingersoll wrote:
> Is this corpus publicly available?  If so, please share.  I'm always  
> on the hunt for free data!

I'm sorry. It's the bibliographic records from the State and University
Library of Denmark and we're not allowed to share them.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


[aside] Re: Solid State Drives vs. RAMDirectory

Posted by Grant Ingersoll <gs...@apache.org>.
Slight aside below?
On Mar 13, 2008, at 7:58 AM, Srikant Jakilinki wrote:
>>
>> Remember, this is all searches with an optimized index. This is on  
>> the
>> corpus from the Danish State and University Library and should be  
>> seen
>> as nothing else than inspiration.
>>

Is this corpus publicly available?  If so, please share.  I'm always  
on the hunt for free data!

Thanks,
Grant

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Solid State Drives vs. RAMDirectory

Posted by Srikant Jakilinki <sr...@bluebottle.com>.
Hi Toke,

Thanks for the write-up. Speaking for the community, the graphs (as 
earlier) would be great.

There is no benchmarks page on the Wiki. There is one on the main site 
to which you can add your stuff -
http://lucene.apache.org/java/2_1_0/benchmarks.html
Maybe one should create one on the wiki - not exactly called benchmarks 
but titled as performance studies/findings or something similar to that.

BTW, guys, I am slowly going into testing performance on multi-disks on 
multi-core machines (one indexer/searcher/index per disk). Initial 
results are encouraging. If anyone has some pointers or done something 
similar, it would be great.

Thanks
Srikant

Toke Eskildsen wrote:
> Time for another dose of inspiration for investigating Solid State
> Drives. And no, I don't get percentages from the chip manufacturers :-)
>
> This time I'll argue that there's little gain in using a RAMDirectory
> over SSDs, when performing searches. At least for our setting.
>
> [...]
>
> Grand conclusion? Getting 3/4 of the performance of RAMDirectory by
> using SSDs on a machine with much less RAM seems like a good deal if
> high performance / machine is needed.
>
> Remember, this is all searches with an optimized index. This is on the
> corpus from the Danish State and University Library and should be seen
> as nothing else than inspiration.
>
> Still pending is experiments with updating large indexes on SSDs. My
> guess is that there won't be anywhere near the same speed-increase as
> for the pure searches. It'll have to wait a bit though, as it requires
> Real Work, as opposed to just starting a script.
>
>
> NB: I'd like to post my findings on the Lucene wiki, but I have been
> unable to locate the appropriate page. Could someone please point me in
> the right direction?
>

----------------------------------------------------------------------
Find out how you can get spam free email.
http://www.bluebottle.com/tag/3


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Pooled searcher

Posted by Karl Wettin <ka...@gmail.com>.
It would be great if you did. Please reply in LUCENE-1265.


Jake Mannix skrev:
> We started doing the same thing (pooling 1 searcher per core) at my
> work when profiling showed a lot of time hitting synchonized blocks
> deep inside the SegmentTermReader (? Might be messing the class up)
> under high load, due to file read()'s using instance variables for
> seeking.  I could dig up the details if you'd like.
> 
> -jake
> 
> 
> 
> On 4/16/08, Karl Wettin <ka...@gmail.com> wrote:
>> Toke Eskildsen skrev:
>>> In the log names, t2 signifies 2 threads with a shared
>>> searcher, t2u signifies 2 threads with separate searchers.
>>>
>>> metis_RAM_24GB_i14_v23_t1_l23.log       530.0 q/sec
>>> metis_RAM_24GB_i14_v23_t2_l23.log       888.2 q/sec
>> Did someone end up investigating this thing with pooled searchers and
>> why it is a performance boost?
>>
>>
>>          karl
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Pooled searcher (was: Solid State Drives vs. RAMDirectory)

Posted by Jake Mannix <ja...@gmail.com>.
We started doing the same thing (pooling 1 searcher per core) at my
work when profiling showed a lot of time hitting synchonized blocks
deep inside the SegmentTermReader (? Might be messing the class up)
under high load, due to file read()'s using instance variables for
seeking.  I could dig up the details if you'd like.

-jake



On 4/16/08, Karl Wettin <ka...@gmail.com> wrote:
> Toke Eskildsen skrev:
> > In the log names, t2 signifies 2 threads with a shared
> > searcher, t2u signifies 2 threads with separate searchers.
> >
> > metis_RAM_24GB_i14_v23_t1_l23.log       530.0 q/sec
> > metis_RAM_24GB_i14_v23_t2_l23.log       888.2 q/sec
>
> Did someone end up investigating this thing with pooled searchers and
> why it is a performance boost?
>
>
>          karl
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

-- 
Sent from Gmail for mobile | mobile.google.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Pooled searcher (was: Solid State Drives vs. RAMDirectory)

Posted by Karl Wettin <ka...@gmail.com>.
Toke Eskildsen skrev:
> In the log names, t2 signifies 2 threads with a shared 
> searcher, t2u signifies 2 threads with separate searchers.
> 
> metis_RAM_24GB_i14_v23_t1_l23.log       530.0 q/sec
> metis_RAM_24GB_i14_v23_t2_l23.log       888.2 q/sec

Did someone end up investigating this thing with pooled searchers and 
why it is a performance boost?


         karl

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org