You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucenenet.apache.org by "Van Den Berghe, Vincent" <Vi...@bvdinfo.com> on 2017/02/25 21:41:20 UTC

Performance improvement for Lucene.net with memory mapped files.

Hello (again),

During performance analysis with an index of 25 million documents and queries having 50 or more clauses, a hotspot was spotted (no pun intended) in the following ByteBuffer method:

        public virtual ByteBuffer Get(byte[] dst, int offset, int length)
        {
            CheckBounds(offset, length, dst.Length);
            if (length > Remaining)
                throw new BufferUnderflowException();
            int end = offset + length;
            for (int i = offset; i < end; i++)
                dst[i] = Get();

            return this;
        }


This fills a buffer by calling the Get() method tens of millions of times. The class MemoryMappedFileByteBuffer, which inherits from ByteBuffer, does the following:

        public override byte Get()
        {
            return _accessor.ReadByte(Ix(NextGetIndex()));
        }


This is horribly inefficient, and it shows: internally, the .NET implementation will perform millions of validation of the constrained region, followed by acquiring the mapped pointer to read a single byte.
By providing MemoryMappedFileByteBuffer with its own implementation:

              public override ByteBuffer Get(byte[] dst, int offset, int length)
              {
                     CheckBounds(offset, length, dst.Length);
                     if (length > Remaining)
                           throw new BufferUnderflowException();
                     _accessor.ReadArray(Ix(NextGetIndex(length)), dst, offset, length);
                     return this;
              }

... an increase of a factor 5 or more can be obtained. Startup and query times are greatly improved.
Similarly, one can define the corresponding:

              public override ByteBuffer Put(byte[] src, int offset, int length)
              {
                     CheckBounds(offset, length, src.Length);
                     if (length > Remaining)
                           throw new BufferOverflowException();
                     _accessor.WriteArray(Ix(NextPutIndex(length)), src, offset, length);
                     return this;
              }


... for a similar improvement in write times, but this was not extensively tested.

Do with this information as you please.

Vincent