You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Tate Avery <ta...@nstein.com> on 2003/10/17 21:03:25 UTC

FileChannel implementation of Directory

Hello,

I was read a posting from Doug Cutting (circa 2001) that stated the following:

"Multi-CPU and/or multi-disk systems can provide greater parallelism and hence query throughput. However Lucene's FSDirectory serializes reads to a given file (since it only has a single file descriptor per file) which limits i/o parallelism. Someone with a large disk array would be better served by a Directory implementation that uses Java 1.4's new i/o classes. In particular, the FileChannel class supports reads that do not move the file pointer, so that multiple reads on the same file can be in progress at the same time."

I attempted to implement this suggestion.  But, I did not have great success.

Basically, I copied the existing FSDirectory (from 1.3-rc1) and modified the FCInputStream inner class.  I changed it to get a FileChannel (channel) in the constructor and to clone properly.  But, mainly, I changed "readInternal" to look like this:

	protected void readInternal(byte[] b, int offset, int len)
		throws IOException
	{
		channel.read(ByteBuffer.wrap(b, offset, len), getFilePointer());
	}

In other words, wrap the byte array, let the channel do the reading, and get the current file pointer from the super class.

It works fine...  the same queries return the same results, etc.  But, the new Directory implementation consistently falls a few ms short of the old one (over sustained trials with various amounts of concurrency) re: overall response time.  Usually it wins out for both 'querying' (i.e. Searcher.search) and loading (i.e. Hits.doc(i) ).

According to the FileChannel API, absolute reads should be able to occur concurrently.  However, the existing FSDirectory serializes access to the underlying files.  So, I figured that FSDirectory would be faster with a single search thread... but FileChannelDirectory would win with multiple threads.  Apparently, not so (given my implementation :-).  I also tested on a regular IDE HD and a SCSI.  Both tests, however, were Win2k based.


Does anyone know why I might not be seing a performance increase for multiple concurrent threads using my "FileChannelDirectory" ?


Any ideas would be appreciated.


Thank you,
Tate

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: FileChannel implementation of Directory

Posted by Doug Cutting <cu...@lucene.com>.
Tate Avery wrote:
> Does anyone know why I might not be seing a performance increase for multiple concurrent threads using my "FileChannelDirectory" ?

Are you using a RAID system?  If the data is already cached in RAM, then 
the i/o calls may be so fast that concurrency doesn't make things 
noticeably faster.  If the data is not already cached, then, with a 
single disk, the OS will have to serialize the i/o requests to that 
drive, so there's no opportunity for concurrency.  If however you have 
an index that is too large to be cached in RAM, your query stream is 
diverse enough so that it cannot be cached, and you have a RAID-based 
file system which can support multiple concurrent i/o operations, then 
you may see a speedup.  Does that make sense?

Doug


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org