You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucenenet.apache.org by "Van Den Berghe, Vincent" <Vi...@bvdinfo.com> on 2017/02/25 21:41:20 UTC
Performance improvement for Lucene.net with memory mapped files.
Hello (again),
During performance analysis with an index of 25 million documents and queries having 50 or more clauses, a hotspot was spotted (no pun intended) in the following ByteBuffer method:
public virtual ByteBuffer Get(byte[] dst, int offset, int length)
{
CheckBounds(offset, length, dst.Length);
if (length > Remaining)
throw new BufferUnderflowException();
int end = offset + length;
for (int i = offset; i < end; i++)
dst[i] = Get();
return this;
}
This fills a buffer by calling the Get() method tens of millions of times. The class MemoryMappedFileByteBuffer, which inherits from ByteBuffer, does the following:
public override byte Get()
{
return _accessor.ReadByte(Ix(NextGetIndex()));
}
This is horribly inefficient, and it shows: internally, the .NET implementation will perform millions of validation of the constrained region, followed by acquiring the mapped pointer to read a single byte.
By providing MemoryMappedFileByteBuffer with its own implementation:
public override ByteBuffer Get(byte[] dst, int offset, int length)
{
CheckBounds(offset, length, dst.Length);
if (length > Remaining)
throw new BufferUnderflowException();
_accessor.ReadArray(Ix(NextGetIndex(length)), dst, offset, length);
return this;
}
... an increase of a factor 5 or more can be obtained. Startup and query times are greatly improved.
Similarly, one can define the corresponding:
public override ByteBuffer Put(byte[] src, int offset, int length)
{
CheckBounds(offset, length, src.Length);
if (length > Remaining)
throw new BufferOverflowException();
_accessor.WriteArray(Ix(NextPutIndex(length)), src, offset, length);
return this;
}
... for a similar improvement in write times, but this was not extensively tested.
Do with this information as you please.
Vincent