You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Julien Nioche <Ju...@lingway.com> on 2003/07/09 11:22:51 UTC

Re: Directory implementation using NIO (moved from Lucene User List)

Hello Francesco,

I ran a little test using your implementation of NIODirectory on a 384 MB
index using a few hundred different queries on win 2K with 523M RAM.

On the first run (with Xms 200M and Xmx300M)
> FSDirectory :: 241 sec
> NIODirectory :: 206 sec

I tried a second time (there is surely a caching effect in the OS)
> FSDirectory :: 223 sec
> NIODirectory :: 182 sec

Of course the absolute value of the results is not really interesting. In my
case and with my hardware configuration, the index is too big to be loaded
in memory with RAMDirectory.
On a smaller index or with less RAM available FSDirectory should be faster?

BTW modifications are to be made in the org.apache.lucene.store.InputStream
class, not in Directory.

Has anybody else tried it? Do you find similar results? What does it bring
on a bigger index?

Cheers

-----------------------------------
Julien Nioche
Consultant in Linguistic Engineering
www.lingway.com


----- Original Message -----
From: "Francesco Bellomi" <fb...@libero.it>
To: "Lucene Users List" <lu...@jakarta.apache.org>
Sent: Sunday, July 06, 2003 10:49 PM
Subject: Directory implementation using NIO


> Hi,
>
> I developed a Directory implementation that accesses an index stored on
the
> filesystem using memory-mapped files (as provided by the NIO API,
introduced
> in Java 1.4).
>
> You can download the complied jar and the source from here:
> www.fran.it/lucene-NIO.zip
>
> Basically there are 3 new classes: NIODirectory, NIOInputStream and
> NIOOutputStream. They are heavily based on FSDirectory, FSInputStream and
> FSOutputStream.
>
> NIOInputStram provides memory-mapped access to files. It does not rely on
> Lucene InputStream's caching feature, since direct access to the
> memory-mapped file should be faster. Also, cloned stream with independent
> positions are implemented using NIO buffer duplication (a buffer duplicate
> holds the same content but has its own position), and so the
implementation
> logic is much simpler than FSInputStream's.
>
> Some methods of Directory have been overridden to replace the caching
> feature. Some of then were final in Directory, so I have used a slightly
> modified version of Directory.java (BTW, I wonder why so many methods in
> Lucene are made final...)
>
> These classes only works with the recently released Java 1.4.2. This is
due
> to the fact that buffers connected with memory-mapped files could not be
> programmatically unmapped in previous releases, (they were unmapped only
> through finalization) and actively mapped files cannot be deleted. These
> issue are partially resolved with 1.4.2.
>
> NIOOutputStream is the same as FSOutputStream; I don't know any way to
take
> advantege of NIO for writing indexes (memory mapped buffers have a static
> size, so they are not useful if your file is growing).
>
> I don't have a benchmarking suite for Lucene, so I can't accurately
evaluate
> the speed of this implementation. I tested it on a small application I am
> developing and it seems to work well, but I think my test are not
> significative. Of course only the searching feature is expected to be
> faster, since the index writing is unchanged.
>
> Francesco
>
> -
> Francesco Bellomi
> "Use truth to show illusion,
> and illusion to show truth."
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: Directory implementation using NIO (moved from Lucene User List)

Posted by Incze Lajos <in...@mail.matav.hu>.
> Roughly, a 20% speedup
> 
> > In my case and with my hardware configuration, the index
> > is too big to be loaded in memory with RAMDirectory.
> > On a smaller index or with less RAM available FSDirectory should be
> > faster?
> 

On a very heavily buffering system (like e.g linux) I would not expect
more difference than this 20%. The main difference is not that either is
in memory the other is not in, but the access is throuh inode or page
mapped IO (booth from memory, mostly). That takes the difference.
(At least I think so.)

incze

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: Directory implementation using NIO (moved from Lucene User List)

Posted by "Francesco Bellomi (UNIVR)" <be...@profs.sci.univr.it>.
Hi Julien,
I just subscribed to lucene-dev.

First of all, thank you for your tests.

> On the first run (with Xms 200M and Xmx300M)
>> FSDirectory :: 241 sec
>> NIODirectory :: 206 sec
>
> I tried a second time (there is surely a caching effect in the OS)
>> FSDirectory :: 223 sec
>> NIODirectory :: 182 sec

Roughly, a 20% speedup

> In my case and with my hardware configuration, the index
> is too big to be loaded in memory with RAMDirectory.
> On a smaller index or with less RAM available FSDirectory should be
> faster?

Memory mapping deals with the low-level virtual memory system, that is,
(simplifying) it caches in RAM the file blocks when they are accessed, and
then discards the least recently used blocks when more space is needed. The
filesystem has a similar approach, but the caching happens at different
levels, including a java-based one. I think NIO is faster simply because
it's more direct and low-level. This advantage should remain also with less
RAM and a smaller index, but only testing will tell.

It would be interesting to test NIO against RAM, with an index smaller
enough to be entirely contained in memory. Cached memory-mapped buffer
should be (theoretically) accessed nearly as fast as RAM.

However, the case I'm really interested in is the one you tested (match
against FS with an index bigger than the memory), and I'm happy to see there
is a speedup.

>
> BTW modifications are to be made in the
> org.apache.lucene.store.InputStream class, not in Directory.

You are absolutely right

>
> Has anybody else tried it? Do you find similar results?
> What does it
> bring on a bigger index?
>
> Cheers
> Julien Nioche

Thanks indeed,
Francesco

-
Francesco Bellomi
"Use truth to show illusion,
and illusion to show truth."



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org