You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by zhoucheng2008 <zh...@gmail.com> on 2011/06/06 15:29:58 UTC

RAMDirectory doesn't win over FSDirectory all the time, why?

I read the lucene in action book and just tested the
FSversusRAMDirectoryTest.java with the following uncommented:

 

    //    /**

    //    // change to adjust performance of indexing with FSDirectory

        writer.mergeFactor = 100;

        writer.maxMergeDocs = 999999;

        writer.minMergeDocs = 1000;

//    */

 

Here is the output:

RAMDirectory Time: 805 ms

FSDirectory Time : 728 ms


RE: RAMDirectory doesn't win over FSDirectory all the time, why?

Posted by zhoucheng2008 <zh...@gmail.com>.
I did run it on a 64bit win7 and use Lucene 3.0.3. The result that FSD outperforms RAM in this case seems to be consistent as I ran a bunch of tests.

My wild guess is that FSD can leverage the MMapDirectory advantage as well as the three tuning parameters. Just a thought.

-----Original Message-----
From: Uwe Schindler [mailto:uwe@thetaphi.de] 
Sent: Tuesday, June 07, 2011 12:04 AM
To: java-user@lucene.apache.org
Subject: RE: RAMDirectory doesn't win over FSDirectory all the time, why?

Hi,

It depends on the Lucene version, so if the test uses latest Lucene on a
64bit OS, it may use MMapDirectory internally (returned on
FSDirectors.open()) - then its comparing the same with the same - reading
from ram memory :-)

Maybe the difference is also caused by not warming hotspot's compiler. Those
times are only ok, if you repeat the same code path quite often and only use
the recent results and not the ones from the first iterations 8AS Java's
hotspot is still optimizing).

We did not got any information about how these numbers were measured.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Michael McCandless [mailto:lucene@mikemccandless.com]
> Sent: Monday, June 06, 2011 5:58 PM
> To: java-user@lucene.apache.org
> Subject: Re: RAMDirectory doesn't win over FSDirectory all the time, why?
> 
> This test is very old (from the 1st edition of the book but removed from
the
> 2nd).
> 
> Modern OS's cache newly written files in RAM, and this test doesn't write
> very large files (I think?), so the test is really testing an OS's IO
cache vs
> Lucene's RAM Dir.
> 
> That said, I'm not sure why RAMDir would be slower... FSDir still must go
> through the OS APIs even if the OS then caches in RAM.
> 
> Mike McCandless
> 
> http://blog.mikemccandless.com
> 
> 2011/6/6 zhoucheng2008 <zh...@gmail.com>:
> > I read the lucene in action book and just tested the
> > FSversusRAMDirectoryTest.java with the following uncommented:
> >
> >
> >
> >    //    /**
> >
> >    //    // change to adjust performance of indexing with FSDirectory
> >
> >        writer.mergeFactor = 100;
> >
> >        writer.maxMergeDocs = 999999;
> >
> >        writer.minMergeDocs = 1000;
> >
> > //    */
> >
> >
> >
> > Here is the output:
> >
> > RAMDirectory Time: 805 ms
> >
> > FSDirectory Time : 728 ms
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: RAMDirectory doesn't win over FSDirectory all the time, why?

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi,

It depends on the Lucene version, so if the test uses latest Lucene on a
64bit OS, it may use MMapDirectory internally (returned on
FSDirectors.open()) - then its comparing the same with the same - reading
from ram memory :-)

Maybe the difference is also caused by not warming hotspot's compiler. Those
times are only ok, if you repeat the same code path quite often and only use
the recent results and not the ones from the first iterations 8AS Java's
hotspot is still optimizing).

We did not got any information about how these numbers were measured.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Michael McCandless [mailto:lucene@mikemccandless.com]
> Sent: Monday, June 06, 2011 5:58 PM
> To: java-user@lucene.apache.org
> Subject: Re: RAMDirectory doesn't win over FSDirectory all the time, why?
> 
> This test is very old (from the 1st edition of the book but removed from
the
> 2nd).
> 
> Modern OS's cache newly written files in RAM, and this test doesn't write
> very large files (I think?), so the test is really testing an OS's IO
cache vs
> Lucene's RAM Dir.
> 
> That said, I'm not sure why RAMDir would be slower... FSDir still must go
> through the OS APIs even if the OS then caches in RAM.
> 
> Mike McCandless
> 
> http://blog.mikemccandless.com
> 
> 2011/6/6 zhoucheng2008 <zh...@gmail.com>:
> > I read the lucene in action book and just tested the
> > FSversusRAMDirectoryTest.java with the following uncommented:
> >
> >
> >
> >    //    /**
> >
> >    //    // change to adjust performance of indexing with FSDirectory
> >
> >        writer.mergeFactor = 100;
> >
> >        writer.maxMergeDocs = 999999;
> >
> >        writer.minMergeDocs = 1000;
> >
> > //    */
> >
> >
> >
> > Here is the output:
> >
> > RAMDirectory Time: 805 ms
> >
> > FSDirectory Time : 728 ms
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: RAMDirectory doesn't win over FSDirectory all the time, why?

Posted by Michael McCandless <lu...@mikemccandless.com>.
This test is very old (from the 1st edition of the book but removed
from the 2nd).

Modern OS's cache newly written files in RAM, and this test doesn't
write very large files (I think?), so the test is really testing an
OS's IO cache vs Lucene's RAM Dir.

That said, I'm not sure why RAMDir would be slower... FSDir still must
go through the OS APIs even if the OS then caches in RAM.

Mike McCandless

http://blog.mikemccandless.com

2011/6/6 zhoucheng2008 <zh...@gmail.com>:
> I read the lucene in action book and just tested the
> FSversusRAMDirectoryTest.java with the following uncommented:
>
>
>
>    //    /**
>
>    //    // change to adjust performance of indexing with FSDirectory
>
>        writer.mergeFactor = 100;
>
>        writer.maxMergeDocs = 999999;
>
>        writer.minMergeDocs = 1000;
>
> //    */
>
>
>
> Here is the output:
>
> RAMDirectory Time: 805 ms
>
> FSDirectory Time : 728 ms
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: RAMDirectory doesn't win over FSDirectory all the time, why?

Posted by Sanne Grinovero <sa...@gmail.com>.
Hello,
I came to similar conclusions, and have a similar comparison test
available here:
https://github.com/infinispan/infinispan/blob/master/lucene-directory/src/test/java/org/infinispan/lucene/profiling/PerformanceCompareStressTest.java

In my test I explicitly run the RAMDirectory first to warmup the JVM
and the other Lucene components; also while I default to a short
testing time to perform a fair comparison you should:
a) make the test quite long - a couple of hours
b) this version starts with an empty index and slowly grows up, it
might make more sense to start with a fairly large index.

I'm running the RAMDirectory first as to be fair in my case I wasn't
very interested in it's performance: being limited to the available
memory on your JVM is imho quite a dealbreaker for real applications,
and also since the operating system can apply several smart caches
when there's enough memory, my conclusion is that when you have
memory, you should limit the JVM heap and leave that to the OS to make
better use of FSDirectory, as this implementation is really well
optimized, at least for local disks.

When you don't have enough available memory, I would suggest - but
warning: I'm biased - to try the Infinispan based Lucene Directory
which is able to "join forces" the memory of multiple (remote) JVMs
and passivate on external storage such as disk only when strictly
needed (or for backups/shutdown): being still mostly an in memory
solution it's able to outperform the FSDirectory during write
operations, and is comparable in search performance, in some cases a
little bit slower but it compensates by being able to scale
horizontally with real time distribution. A current limitation is that
you still need to use a single IndexWriter, even cluster-wide: the
code is very simple and directly mimics the FSDirectory logic, so it
supports all the same features and inherits the same limitations
unlike other distributed solutions.

Regards,
Sanne

2011/6/17 Lance Norskog <go...@gmail.com>:
> The RAMDirectory uses Java memory, an FSDirectory does not. Holding
> Java memory makes garbage collection work harder. The operating system
> is very very good at managing disk buffers, and does a better job
> using spare memory than Java does.
>
> For real-world sites, RAMDirectory is almost always useless. Maybe the
> Instantiated index stuff is more what you want?
>
> Lance
>
> On Tue, Jun 7, 2011 at 2:52 AM, zhoucheng2008 <zh...@gmail.com> wrote:
>> Makes sense. Thanks
>>
>> -----Original Message-----
>> From: Toke Eskildsen [mailto:te@statsbiblioteket.dk]
>> Sent: Tuesday, June 07, 2011 4:28 PM
>> To: java-user@lucene.apache.org
>> Subject: Re: RAMDirectory doesn't win over FSDirectory all the time, why?
>>
>> On Mon, 2011-06-06 at 15:29 +0200, zhoucheng2008 wrote:
>>> I read the lucene in action book and just tested the
>>> FSversusRAMDirectoryTest.java with the following uncommented:
>>> [...]Here is the output:
>>>
>>> RAMDirectory Time: 805 ms
>>>
>>> FSDirectory Time : 728 ms
>>
>> This is the code, right?
>> http://java.codefetch.com/example/in/LuceneInAction/src/lia/indexing/FSversusRAMDirectoryTest.java
>>
>> The test is problematic as the same two tests run sequentially.
>>
>> If you change
>>  long ramTiming = timeIndexWriter(ramDir);
>>  long fsTiming = timeIndexWriter(fsDir);
>> to
>>  long fsTiming = timeIndexWriter(fsDir);
>>  long ramTiming = timeIndexWriter(ramDir);
>> my guess is that RAMDirectory will be faster. For a better
>> comparison, perform each test in separate runs (make a test
>> class just for RAMDirectory and one just for FSDirectory,
>> then run them one at a time, each in its own JVM).
>>
>> One big problem when comparing RAMDirectory to file-access
>> is caching. What you measure with a test might not be what
>> you see in production, as the production index might be
>> large compared to RAM available for file caching.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: RAMDirectory doesn't win over FSDirectory all the time, why?

Posted by Lance Norskog <go...@gmail.com>.
The RAMDirectory uses Java memory, an FSDirectory does not. Holding
Java memory makes garbage collection work harder. The operating system
is very very good at managing disk buffers, and does a better job
using spare memory than Java does.

For real-world sites, RAMDirectory is almost always useless. Maybe the
Instantiated index stuff is more what you want?

Lance

On Tue, Jun 7, 2011 at 2:52 AM, zhoucheng2008 <zh...@gmail.com> wrote:
> Makes sense. Thanks
>
> -----Original Message-----
> From: Toke Eskildsen [mailto:te@statsbiblioteket.dk]
> Sent: Tuesday, June 07, 2011 4:28 PM
> To: java-user@lucene.apache.org
> Subject: Re: RAMDirectory doesn't win over FSDirectory all the time, why?
>
> On Mon, 2011-06-06 at 15:29 +0200, zhoucheng2008 wrote:
>> I read the lucene in action book and just tested the
>> FSversusRAMDirectoryTest.java with the following uncommented:
>> [...]Here is the output:
>>
>> RAMDirectory Time: 805 ms
>>
>> FSDirectory Time : 728 ms
>
> This is the code, right?
> http://java.codefetch.com/example/in/LuceneInAction/src/lia/indexing/FSversusRAMDirectoryTest.java
>
> The test is problematic as the same two tests run sequentially.
>
> If you change
>  long ramTiming = timeIndexWriter(ramDir);
>  long fsTiming = timeIndexWriter(fsDir);
> to
>  long fsTiming = timeIndexWriter(fsDir);
>  long ramTiming = timeIndexWriter(ramDir);
> my guess is that RAMDirectory will be faster. For a better
> comparison, perform each test in separate runs (make a test
> class just for RAMDirectory and one just for FSDirectory,
> then run them one at a time, each in its own JVM).
>
> One big problem when comparing RAMDirectory to file-access
> is caching. What you measure with a test might not be what
> you see in production, as the production index might be
> large compared to RAM available for file caching.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>



-- 
Lance Norskog
goksron@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: RAMDirectory doesn't win over FSDirectory all the time, why?

Posted by zhoucheng2008 <zh...@gmail.com>.
Makes sense. Thanks

-----Original Message-----
From: Toke Eskildsen [mailto:te@statsbiblioteket.dk] 
Sent: Tuesday, June 07, 2011 4:28 PM
To: java-user@lucene.apache.org
Subject: Re: RAMDirectory doesn't win over FSDirectory all the time, why?

On Mon, 2011-06-06 at 15:29 +0200, zhoucheng2008 wrote:
> I read the lucene in action book and just tested the
> FSversusRAMDirectoryTest.java with the following uncommented:
> [...]Here is the output:
> 
> RAMDirectory Time: 805 ms
> 
> FSDirectory Time : 728 ms

This is the code, right?
http://java.codefetch.com/example/in/LuceneInAction/src/lia/indexing/FSversusRAMDirectoryTest.java

The test is problematic as the same two tests run sequentially.

If you change 
  long ramTiming = timeIndexWriter(ramDir);
  long fsTiming = timeIndexWriter(fsDir);
to
  long fsTiming = timeIndexWriter(fsDir);
  long ramTiming = timeIndexWriter(ramDir); 
my guess is that RAMDirectory will be faster. For a better
comparison, perform each test in separate runs (make a test
class just for RAMDirectory and one just for FSDirectory,
then run them one at a time, each in its own JVM).

One big problem when comparing RAMDirectory to file-access
is caching. What you measure with a test might not be what
you see in production, as the production index might be
large compared to RAM available for file caching.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: RAMDirectory doesn't win over FSDirectory all the time, why?

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
On Mon, 2011-06-06 at 15:29 +0200, zhoucheng2008 wrote:
> I read the lucene in action book and just tested the
> FSversusRAMDirectoryTest.java with the following uncommented:
> [...]Here is the output:
> 
> RAMDirectory Time: 805 ms
> 
> FSDirectory Time : 728 ms

This is the code, right?
http://java.codefetch.com/example/in/LuceneInAction/src/lia/indexing/FSversusRAMDirectoryTest.java

The test is problematic as the same two tests run sequentially.

If you change 
  long ramTiming = timeIndexWriter(ramDir);
  long fsTiming = timeIndexWriter(fsDir);
to
  long fsTiming = timeIndexWriter(fsDir);
  long ramTiming = timeIndexWriter(ramDir); 
my guess is that RAMDirectory will be faster. For a better
comparison, perform each test in separate runs (make a test
class just for RAMDirectory and one just for FSDirectory,
then run them one at a time, each in its own JVM).

One big problem when comparing RAMDirectory to file-access
is caching. What you measure with a test might not be what
you see in production, as the production index might be
large compared to RAM available for file caching.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org