You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Michael Poindexter <st...@gmail.com> on 2013/03/17 03:13:56 UTC

NIO2 Directory implementations

As part of a project using Lucene I have implemented a trio of Directories
roughly corresponding to the FSDirectory implementations in core.  These
directory implementations use the NIO2 API's in JDK7 when opening files.
 This ensures that on Windows the files are opened in a mode that allows
deletion even if the file is open elsewhere.

1.) JDK7MMapDirectory - Roughly the same as MMapDirectory.  Uses
FileChannel.open (instead of RandomAccessFile) to create a FileChannel that
then has map() called on it to create the mapped buffers.
2.) JDK7NIOFSDirectory - Roughly the same as NIOFSDirectory, but uses
FileChannel.open to create the file channel instead of RandomAccessFile.
3.) JDK7AsyncFSDirectory - This one is new and different.  I needed a
replacement for SimpleFSDirectory (that was not susceptible to problems if
interrupt()'ed) and did not have the synchronization problems on Windows of
NIOFSDirectory.  This one is used where SimpleFSDirectory could have been
used, but uses an AsynchronousFileChannel to do it's work.  The actual
operation is still synchronous, but on Windows AsynchronousFileChannel uses
overlapped IO, and hence does not require synchronization on the position
and should be safe for interrupts.

A couple of questions:
1.)  Is there any interest in me contributing these to Lucene?  They
require JDK7+, but perhaps they could go in a contrib module?
2.)  While implementing these I noticed the implementation of
FSDirectory.sync seems a little strange:  It just opens a new
RandomAccessFile and forces a sync using it.  The JavaDocs seem to imply
that this would force a sync on the file handle associated with the
RandomAccessFile, but that's not the file handle that was written to as
part of an IndexOutput.  On Windows at least this won't matter, but it
seems theoretically wrong...i.e. according to the JavaDoc on a given
platform this style of operation could have no impact if I am understanding
it correctly.  It seems like maybe it would be better to have a sync() call
on an IndexOutput that can be called before closing it...am I missing
something here?
3.)  What is the best way to go about benchmarking/testing these new
implementations to compare against the core FSDirectory implementations?
 I've seen some references to randomized tests and benchmarks on the
developer pages on the Lucene website, but I didn't see anything that was
along the lines of "Here's how to run the benchmarks"...any pointers would
be much appreciated.

Thanks,
Mike Poindexter

Re: NIO2 Directory implementations

Posted by Michael McCandless <lu...@mikemccandless.com>.
These directory implementations sound very interesting!

Yes, please open a Jira issue and attach a patch.

Some responses below:

On Sat, Mar 16, 2013 at 10:13 PM, Michael Poindexter
<st...@gmail.com> wrote:
> As part of a project using Lucene I have implemented a trio of Directories
> roughly corresponding to the FSDirectory implementations in core.  These
> directory implementations use the NIO2 API's in JDK7 when opening files.
>  This ensures that on Windows the files are opened in a mode that allows
> deletion even if the file is open elsewhere.
>
> 1.) JDK7MMapDirectory - Roughly the same as MMapDirectory.  Uses
> FileChannel.open (instead of RandomAccessFile) to create a FileChannel that
> then has map() called on it to create the mapped buffers.
> 2.) JDK7NIOFSDirectory - Roughly the same as NIOFSDirectory, but uses
> FileChannel.open to create the file channel instead of RandomAccessFile.
> 3.) JDK7AsyncFSDirectory - This one is new and different.  I needed a
> replacement for SimpleFSDirectory (that was not susceptible to problems if
> interrupt()'ed) and did not have the synchronization problems on Windows of
> NIOFSDirectory.  This one is used where SimpleFSDirectory could have been
> used, but uses an AsynchronousFileChannel to do it's work.  The actual
> operation is still synchronous, but on Windows AsynchronousFileChannel uses
> overlapped IO, and hence does not require synchronization on the position
> and should be safe for interrupts.

Awesome!

On Unix would this impl also be safe for interrupts?

> A couple of questions:
> 1.)  Is there any interest in me contributing these to Lucene?  They
> require JDK7+, but perhaps they could go in a contrib module?

Maybe in the misc module (lucene/misc)?

> 2.)  While implementing these I noticed the implementation of
> FSDirectory.sync seems a little strange:  It just opens a new
> RandomAccessFile and forces a sync using it.  The JavaDocs seem to imply
> that this would force a sync on the file handle associated with the
> RandomAccessFile, but that's not the file handle that was written to as
> part of an IndexOutput.  On Windows at least this won't matter, but it
> seems theoretically wrong...i.e. according to the JavaDoc on a given
> platform this style of operation could have no impact if I am understanding
> it correctly.  It seems like maybe it would be better to have a sync() call
> on an IndexOutput that can be called before closing it...am I missing
> something here?

Yes, this is indeed very strange: ideally we'd fsync on the
IndexOutput before it was closed, but this is unfortunately tricky to
do in Lucene because at the time we write to the IndexOutput we don't
know if it will be a file that will be commit'd in the future.

This was also raised in https://issues.apache.org/jira/browse/LUCENE-3237

> 3.)  What is the best way to go about benchmarking/testing these new
> implementations to compare against the core FSDirectory implementations?
>  I've seen some references to randomized tests and benchmarks on the
> developer pages on the Lucene website, but I didn't see anything that was
> along the lines of "Here's how to run the benchmarks"...any pointers would
> be much appreciated.

I think start with an issue/patch and then others can help with
benchmarking?  I would use luceneutil to run a standard
indexings/searching test using the Wikipedia corpus.

> Thanks,

Thank you!

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: NIO2 Directory implementations

Posted by Uwe Schindler <uw...@thetaphi.de>.
We are planning to move to Java 7 in Lucene trunk (5.0), so your input is fine! Just file a JIRA issue and attach the code!

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Michael Poindexter [mailto:staticsnow@gmail.com]
> Sent: Sunday, March 17, 2013 3:14 AM
> To: java-user@lucene.apache.org
> Subject: NIO2 Directory implementations
> 
> As part of a project using Lucene I have implemented a trio of Directories
> roughly corresponding to the FSDirectory implementations in core.  These
> directory implementations use the NIO2 API's in JDK7 when opening files.
>  This ensures that on Windows the files are opened in a mode that allows
> deletion even if the file is open elsewhere.
> 
> 1.) JDK7MMapDirectory - Roughly the same as MMapDirectory.  Uses
> FileChannel.open (instead of RandomAccessFile) to create a FileChannel that
> then has map() called on it to create the mapped buffers.
> 2.) JDK7NIOFSDirectory - Roughly the same as NIOFSDirectory, but uses
> FileChannel.open to create the file channel instead of RandomAccessFile.
> 3.) JDK7AsyncFSDirectory - This one is new and different.  I needed a
> replacement for SimpleFSDirectory (that was not susceptible to problems if
> interrupt()'ed) and did not have the synchronization problems on Windows
> of NIOFSDirectory.  This one is used where SimpleFSDirectory could have
> been used, but uses an AsynchronousFileChannel to do it's work.  The actual
> operation is still synchronous, but on Windows AsynchronousFileChannel
> uses overlapped IO, and hence does not require synchronization on the
> position and should be safe for interrupts.
> 
> A couple of questions:
> 1.)  Is there any interest in me contributing these to Lucene?  They require
> JDK7+, but perhaps they could go in a contrib module?
> 2.)  While implementing these I noticed the implementation of
> FSDirectory.sync seems a little strange:  It just opens a new
> RandomAccessFile and forces a sync using it.  The JavaDocs seem to imply
> that this would force a sync on the file handle associated with the
> RandomAccessFile, but that's not the file handle that was written to as part
> of an IndexOutput.  On Windows at least this won't matter, but it seems
> theoretically wrong...i.e. according to the JavaDoc on a given platform this
> style of operation could have no impact if I am understanding it correctly.  It
> seems like maybe it would be better to have a sync() call on an IndexOutput
> that can be called before closing it...am I missing something here?
> 3.)  What is the best way to go about benchmarking/testing these new
> implementations to compare against the core FSDirectory implementations?
>  I've seen some references to randomized tests and benchmarks on the
> developer pages on the Lucene website, but I didn't see anything that was
> along the lines of "Here's how to run the benchmarks"...any pointers would
> be much appreciated.
> 
> Thanks,
> Mike Poindexter


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org