You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Doug Cutting (JIRA)" <ji...@apache.org> on 2005/10/26 23:30:55 UTC

[jira] Commented: (LUCENE-414) Java NIO patch against Lucene 1.9

    [ http://issues.apache.org/jira/browse/LUCENE-414?page=comments#action_12356015 ] 

Doug Cutting commented on LUCENE-414:
-------------------------------------

The channels should all be opened when the IndexInput is created, as files can subsequently get deleted.

Also, I'm not sure why this uses nio.  Classic io would also permit you to have multiple file handles per file, for more parallel io.  So you could just patch FSDirectory to permit that, no?

Finally, if files are on a single drive, then the concurrency improvements are probably negligible.  This would only really pay off with a RAID, where different parts of a file are stored on different physical devices.  Or am I missing something?

> Java NIO patch against Lucene 1.9
> ---------------------------------
>
>          Key: LUCENE-414
>          URL: http://issues.apache.org/jira/browse/LUCENE-414
>      Project: Lucene - Java
>         Type: Bug
>   Components: Store
>     Versions: unspecified
>  Environment: Operating System: All
> Platform: All
>     Reporter: Chris Lamprecht
>     Assignee: Lucene Developers
>  Attachments: MemoryLRUCache.java, NioFile.java, nio-lucene-1.9.patch
>
> Robert Engels previously submitted a patch against Lucene 1.4 for a Java NIO-
> based Directory implementation.  It also included some changes to FSDirectory 
> to allow better concurrency when searching from multiple threads.  The 
> complete thread is at:
> http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200505.mbox/%
> 3cLMENLAOACIBLMOIILNNNEEOEEPAA.rengels@ix.netcom.com%3e
> This thread ended with Doug Cutting suggesting that someone port Robert's 
> changes to the SVN trunk.  This is what I've done in this patch.
> There are two parts to the patch.  The first part modifies FieldsReader, 
> CompoundFileReader, and SegmentReader, to allow better concurrency when 
> reading an index.  The second part includes the new NioFSDirectory 
> implementation, and makes small changes to FSDirectory and IndexInput to 
> accomodate this change.  I'll put a more detailed outline of the changes to 
> each file in a separate message.
> To use the new NioFSDirectory, set the system property 
> org.apache.lucene.FSDirectory.class to 
> org.apache.lucene.store.NioFSDirectory.  This will cause 
> FSDirectory.getDirectory() to return an NioFSDirectory instance.  By default, 
> NioFile limits the number of concurrent channels to 4, but you can override 
> this by setting the system property org.apache.lucene.nio.channels.  
> I did some performance tests with these patches.  The biggest improvement came 
> from the concurrency improvements.  NioFSDirectory performed about the same as 
> FSDirectory (with the concurrency improvements).  
> I ran my tests under Fedora Core 1; uname -a reports:
> Linux myhost 2.4.22-1.2199.nptlsmp #1 SMP Wed Aug 4 11:48:29 EDT 2004 i686 
> i686 i386 GNU/Linux
> The machine is a dual xeon 2.8GHz with 4GB RAM, and the tests were run against 
> a 9GB compound index file.  The tests were run "hot" -- with everything 
> already cached by linux's filesystem cache.  The numbers are:
> FSDirectory without patch:          13.3 searches per second
> FSDirectory WITH concurrency patch: 14.3 searches per second
> Both tests were run with 6 concurrent threads, which gave the highest numbers 
> in each case.  I suspect that the concurrency improvements would make a bigger 
> difference on a more realistic test where the index isn't all cached in RAM 
> already, since the I/O happens whild holding the sychronized lock.  Patches to 
> follow...
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


RE: [jira] Commented: (LUCENE-414) Java NIO patch against Lucene 1.9

Posted by Robert Engels <re...@ix.netcom.com>.
You are correct, this is to get around JDK bug 6265734. (The bug was
originally cited by me, by the test code attached to the bug seems to bear
out that my assessment is correct). It should be document in the code that
this is a work-around (and does increase the number of file handles needed).
I will look into whether or not using multiple RandomAccessFiles has any
performance difference.

I am not sure how to benchmark this. I state this from my understanding of
optimizing disk subsystems, but I am sure it is very hardware dependent. I
do know by reading the SCSI documentation, and other UltaATA documentation
that the controller will coalesce requests, so you need to get multiple
requests to the controller. If the thread blocks in java, you will never get
multiple requests to the controller.

I am working on a performance test case for the caching right now...



-----Original Message-----
From: Doug Cutting [mailto:cutting@apache.org]
Sent: Wednesday, October 26, 2005 4:51 PM
To: java-dev@lucene.apache.org
Subject: Re: [jira] Commented: (LUCENE-414) Java NIO patch against
Lucene 1.9


Robert Engels wrote:
> The reason for using Nio and not IO is IO requires multiple file handles
per file. There are already numerous bugs/work-arounds in Lucene to limit
the use of file handles (as this is a OS limited resource), so I did not
wish to further increase the number of file descriptors needed.

Yes, but it appears to me that the submitted NioFile class opens a new
file handle per channel.  So I don't see how this addresses that.

> Your statement that a raid system would be needed to exploit the added
concurrency is not exactly correct. By using multiple threads, even if the
disk is busy handling a request, the OS can combine the pending requests and
perform more efficient reads to the disk subsystem when it becomes
available.

Perhaps.  It would be nice to see a benchmark demonstrating this.

> I also dispute the performance numbers cited. In my testing the 'user
level' cache improved performance of query operations nearly 100%. I will
write a testcase to demonstrate the increased performance. This testcase can
be written independent of Lucene.

Can you provide your benchmark results?

Thanks,

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: [jira] Commented: (LUCENE-414) Java NIO patch against Lucene 1.9

Posted by Doug Cutting <cu...@apache.org>.
Robert Engels wrote:
> The reason for using Nio and not IO is IO requires multiple file handles per file. There are already numerous bugs/work-arounds in Lucene to limit the use of file handles (as this is a OS limited resource), so I did not wish to further increase the number of file descriptors needed.

Yes, but it appears to me that the submitted NioFile class opens a new 
file handle per channel.  So I don't see how this addresses that.

> Your statement that a raid system would be needed to exploit the added concurrency is not exactly correct. By using multiple threads, even if the disk is busy handling a request, the OS can combine the pending requests and perform more efficient reads to the disk subsystem when it becomes available.

Perhaps.  It would be nice to see a benchmark demonstrating this.

> I also dispute the performance numbers cited. In my testing the 'user level' cache improved performance of query operations nearly 100%. I will write a testcase to demonstrate the increased performance. This testcase can be written independent of Lucene.

Can you provide your benchmark results?

Thanks,

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


RE: [jira] Commented: (LUCENE-414) Java NIO patch against Lucene 1.9

Posted by Robert Engels <re...@ix.netcom.com>.
The reason for using Nio and not IO is IO requires multiple file handles per file. There are already numerous bugs/work-arounds in Lucene to limit the use of file handles (as this is a OS limited resource), so I did not wish to further increase the number of file descriptors needed.

Your statement that a raid system would be needed to exploit the added concurrency is not exactly correct. By using multiple threads, even if the disk is busy handling a request, the OS can combine the pending requests and perform more efficient reads to the disk subsystem when it becomes available.

I also dispute the performance numbers cited. In my testing the 'user level' cache improved performance of query operations nearly 100%. I will write a testcase to demonstrate the increased performance. This testcase can be written independent of Lucene.

-----Original Message-----
From: Doug Cutting (JIRA) [mailto:jira@apache.org]
Sent: Wednesday, October 26, 2005 4:31 PM
To: java-dev@lucene.apache.org
Subject: [jira] Commented: (LUCENE-414) Java NIO patch against Lucene
1.9


    [ http://issues.apache.org/jira/browse/LUCENE-414?page=comments#action_12356015 ] 

Doug Cutting commented on LUCENE-414:
-------------------------------------

The channels should all be opened when the IndexInput is created, as files can subsequently get deleted.

Also, I'm not sure why this uses nio.  Classic io would also permit you to have multiple file handles per file, for more parallel io.  So you could just patch FSDirectory to permit that, no?

Finally, if files are on a single drive, then the concurrency improvements are probably negligible.  This would only really pay off with a RAID, where different parts of a file are stored on different physical devices.  Or am I missing something?

> Java NIO patch against Lucene 1.9
> ---------------------------------
>
>          Key: LUCENE-414
>          URL: http://issues.apache.org/jira/browse/LUCENE-414
>      Project: Lucene - Java
>         Type: Bug
>   Components: Store
>     Versions: unspecified
>  Environment: Operating System: All
> Platform: All
>     Reporter: Chris Lamprecht
>     Assignee: Lucene Developers
>  Attachments: MemoryLRUCache.java, NioFile.java, nio-lucene-1.9.patch
>
> Robert Engels previously submitted a patch against Lucene 1.4 for a Java NIO-
> based Directory implementation.  It also included some changes to FSDirectory 
> to allow better concurrency when searching from multiple threads.  The 
> complete thread is at:
> http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200505.mbox/%
> 3cLMENLAOACIBLMOIILNNNEEOEEPAA.rengels@ix.netcom.com%3e
> This thread ended with Doug Cutting suggesting that someone port Robert's 
> changes to the SVN trunk.  This is what I've done in this patch.
> There are two parts to the patch.  The first part modifies FieldsReader, 
> CompoundFileReader, and SegmentReader, to allow better concurrency when 
> reading an index.  The second part includes the new NioFSDirectory 
> implementation, and makes small changes to FSDirectory and IndexInput to 
> accomodate this change.  I'll put a more detailed outline of the changes to 
> each file in a separate message.
> To use the new NioFSDirectory, set the system property 
> org.apache.lucene.FSDirectory.class to 
> org.apache.lucene.store.NioFSDirectory.  This will cause 
> FSDirectory.getDirectory() to return an NioFSDirectory instance.  By default, 
> NioFile limits the number of concurrent channels to 4, but you can override 
> this by setting the system property org.apache.lucene.nio.channels.  
> I did some performance tests with these patches.  The biggest improvement came 
> from the concurrency improvements.  NioFSDirectory performed about the same as 
> FSDirectory (with the concurrency improvements).  
> I ran my tests under Fedora Core 1; uname -a reports:
> Linux myhost 2.4.22-1.2199.nptlsmp #1 SMP Wed Aug 4 11:48:29 EDT 2004 i686 
> i686 i386 GNU/Linux
> The machine is a dual xeon 2.8GHz with 4GB RAM, and the tests were run against 
> a 9GB compound index file.  The tests were run "hot" -- with everything 
> already cached by linux's filesystem cache.  The numbers are:
> FSDirectory without patch:          13.3 searches per second
> FSDirectory WITH concurrency patch: 14.3 searches per second
> Both tests were run with 6 concurrent threads, which gave the highest numbers 
> in each case.  I suspect that the concurrency improvements would make a bigger 
> difference on a more realistic test where the index isn't all cached in RAM 
> already, since the I/O happens whild holding the sychronized lock.  Patches to 
> follow...
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org