You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Francesco Bellomi <fb...@libero.it> on 2003/07/06 22:49:59 UTC

Directory implementation using NIO

Hi,

I developed a Directory implementation that accesses an index stored on the
filesystem using memory-mapped files (as provided by the NIO API, introduced
in Java 1.4).

You can download the complied jar and the source from here:
www.fran.it/lucene-NIO.zip

Basically there are 3 new classes: NIODirectory, NIOInputStream and
NIOOutputStream. They are heavily based on FSDirectory, FSInputStream and
FSOutputStream.

NIOInputStram provides memory-mapped access to files. It does not rely on
Lucene InputStream's caching feature, since direct access to the
memory-mapped file should be faster. Also, cloned stream with independent
positions are implemented using NIO buffer duplication (a buffer duplicate
holds the same content but has its own position), and so the implementation
logic is much simpler than FSInputStream's.

Some methods of Directory have been overridden to replace the caching
feature. Some of then were final in Directory, so I have used a slightly
modified version of Directory.java (BTW, I wonder why so many methods in
Lucene are made final...)

These classes only works with the recently released Java 1.4.2. This is due
to the fact that buffers connected with memory-mapped files could not be
programmatically unmapped in previous releases, (they were unmapped only
through finalization) and actively mapped files cannot be deleted. These
issue are partially resolved with 1.4.2.

NIOOutputStream is the same as FSOutputStream; I don't know any way to take
advantege of NIO for writing indexes (memory mapped buffers have a static
size, so they are not useful if your file is growing).

I don't have a benchmarking suite for Lucene, so I can't accurately evaluate
the speed of this implementation. I tested it on a small application I am
developing and it seems to work well, but I think my test are not
significative. Of course only the searching feature is expected to be
faster, since the index writing is unchanged.

Francesco

-
Francesco Bellomi
"Use truth to show illusion,
and illusion to show truth."



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Directory implementation using NIO (moved from Lucene User List)

Posted by Incze Lajos <in...@mail.matav.hu>.
> Roughly, a 20% speedup
> 
> > In my case and with my hardware configuration, the index
> > is too big to be loaded in memory with RAMDirectory.
> > On a smaller index or with less RAM available FSDirectory should be
> > faster?
> 

On a very heavily buffering system (like e.g linux) I would not expect
more difference than this 20%. The main difference is not that either is
in memory the other is not in, but the access is throuh inode or page
mapped IO (booth from memory, mostly). That takes the difference.
(At least I think so.)

incze

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: Directory implementation using NIO (moved from Lucene User List)

Posted by "Francesco Bellomi (UNIVR)" <be...@profs.sci.univr.it>.
Hi Julien,
I just subscribed to lucene-dev.

First of all, thank you for your tests.

> On the first run (with Xms 200M and Xmx300M)
>> FSDirectory :: 241 sec
>> NIODirectory :: 206 sec
>
> I tried a second time (there is surely a caching effect in the OS)
>> FSDirectory :: 223 sec
>> NIODirectory :: 182 sec

Roughly, a 20% speedup

> In my case and with my hardware configuration, the index
> is too big to be loaded in memory with RAMDirectory.
> On a smaller index or with less RAM available FSDirectory should be
> faster?

Memory mapping deals with the low-level virtual memory system, that is,
(simplifying) it caches in RAM the file blocks when they are accessed, and
then discards the least recently used blocks when more space is needed. The
filesystem has a similar approach, but the caching happens at different
levels, including a java-based one. I think NIO is faster simply because
it's more direct and low-level. This advantage should remain also with less
RAM and a smaller index, but only testing will tell.

It would be interesting to test NIO against RAM, with an index smaller
enough to be entirely contained in memory. Cached memory-mapped buffer
should be (theoretically) accessed nearly as fast as RAM.

However, the case I'm really interested in is the one you tested (match
against FS with an index bigger than the memory), and I'm happy to see there
is a speedup.

>
> BTW modifications are to be made in the
> org.apache.lucene.store.InputStream class, not in Directory.

You are absolutely right

>
> Has anybody else tried it? Do you find similar results?
> What does it
> bring on a bigger index?
>
> Cheers
> Julien Nioche

Thanks indeed,
Francesco

-
Francesco Bellomi
"Use truth to show illusion,
and illusion to show truth."



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: Directory implementation using NIO (moved from Lucene User List)

Posted by Julien Nioche <Ju...@lingway.com>.
Hello Francesco,

I ran a little test using your implementation of NIODirectory on a 384 MB
index using a few hundred different queries on win 2K with 523M RAM.

On the first run (with Xms 200M and Xmx300M)
> FSDirectory :: 241 sec
> NIODirectory :: 206 sec

I tried a second time (there is surely a caching effect in the OS)
> FSDirectory :: 223 sec
> NIODirectory :: 182 sec

Of course the absolute value of the results is not really interesting. In my
case and with my hardware configuration, the index is too big to be loaded
in memory with RAMDirectory.
On a smaller index or with less RAM available FSDirectory should be faster?

BTW modifications are to be made in the org.apache.lucene.store.InputStream
class, not in Directory.

Has anybody else tried it? Do you find similar results? What does it bring
on a bigger index?

Cheers

-----------------------------------
Julien Nioche
Consultant in Linguistic Engineering
www.lingway.com


----- Original Message -----
From: "Francesco Bellomi" <fb...@libero.it>
To: "Lucene Users List" <lu...@jakarta.apache.org>
Sent: Sunday, July 06, 2003 10:49 PM
Subject: Directory implementation using NIO


> Hi,
>
> I developed a Directory implementation that accesses an index stored on
the
> filesystem using memory-mapped files (as provided by the NIO API,
introduced
> in Java 1.4).
>
> You can download the complied jar and the source from here:
> www.fran.it/lucene-NIO.zip
>
> Basically there are 3 new classes: NIODirectory, NIOInputStream and
> NIOOutputStream. They are heavily based on FSDirectory, FSInputStream and
> FSOutputStream.
>
> NIOInputStram provides memory-mapped access to files. It does not rely on
> Lucene InputStream's caching feature, since direct access to the
> memory-mapped file should be faster. Also, cloned stream with independent
> positions are implemented using NIO buffer duplication (a buffer duplicate
> holds the same content but has its own position), and so the
implementation
> logic is much simpler than FSInputStream's.
>
> Some methods of Directory have been overridden to replace the caching
> feature. Some of then were final in Directory, so I have used a slightly
> modified version of Directory.java (BTW, I wonder why so many methods in
> Lucene are made final...)
>
> These classes only works with the recently released Java 1.4.2. This is
due
> to the fact that buffers connected with memory-mapped files could not be
> programmatically unmapped in previous releases, (they were unmapped only
> through finalization) and actively mapped files cannot be deleted. These
> issue are partially resolved with 1.4.2.
>
> NIOOutputStream is the same as FSOutputStream; I don't know any way to
take
> advantege of NIO for writing indexes (memory mapped buffers have a static
> size, so they are not useful if your file is growing).
>
> I don't have a benchmarking suite for Lucene, so I can't accurately
evaluate
> the speed of this implementation. I tested it on a small application I am
> developing and it seems to work well, but I think my test are not
> significative. Of course only the searching feature is expected to be
> faster, since the index writing is unchanged.
>
> Francesco
>
> -
> Francesco Bellomi
> "Use truth to show illusion,
> and illusion to show truth."
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: Directory implementation using NIO

Posted by Scott Ganyo <sc...@etapestry.com>.
Wonderful!  I can't wait to try this.  I'll try to provide some 
comparisons as I get to it, but I'd love to hear from anyone else that 
tries this...

Thanks,
Scott

Francesco Bellomi wrote:

>Hi,
>
>I developed a Directory implementation that accesses an index stored on the
>filesystem using memory-mapped files (as provided by the NIO API, introduced
>in Java 1.4).
>
>You can download the complied jar and the source from here:
>www.fran.it/lucene-NIO.zip
>
>Basically there are 3 new classes: NIODirectory, NIOInputStream and
>NIOOutputStream. They are heavily based on FSDirectory, FSInputStream and
>FSOutputStream.
>
>NIOInputStram provides memory-mapped access to files. It does not rely on
>Lucene InputStream's caching feature, since direct access to the
>memory-mapped file should be faster. Also, cloned stream with independent
>positions are implemented using NIO buffer duplication (a buffer duplicate
>holds the same content but has its own position), and so the implementation
>logic is much simpler than FSInputStream's.
>
>Some methods of Directory have been overridden to replace the caching
>feature. Some of then were final in Directory, so I have used a slightly
>modified version of Directory.java (BTW, I wonder why so many methods in
>Lucene are made final...)
>
>These classes only works with the recently released Java 1.4.2. This is due
>to the fact that buffers connected with memory-mapped files could not be
>programmatically unmapped in previous releases, (they were unmapped only
>through finalization) and actively mapped files cannot be deleted. These
>issue are partially resolved with 1.4.2.
>
>NIOOutputStream is the same as FSOutputStream; I don't know any way to take
>advantege of NIO for writing indexes (memory mapped buffers have a static
>size, so they are not useful if your file is growing).
>
>I don't have a benchmarking suite for Lucene, so I can't accurately evaluate
>the speed of this implementation. I tested it on a small application I am
>developing and it seems to work well, but I think my test are not
>significative. Of course only the searching feature is expected to be
>faster, since the index writing is unchanged.
>
>Francesco
>
>-
>Francesco Bellomi
>"Use truth to show illusion,
>and illusion to show truth."
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>  
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org