You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Earwin Burrfoot <ea...@gmail.com> on 2009/03/29 00:43:28 UTC

Possible IndexInput optimization

While drooling over MappedBigByteBuffer, which we'll (hopefully) see
in JDK7, I revisited my own Directory code and noticed a certain
peculiarity, shared by Lucene core classes:
Each and every IndexInput implementation only implements readByte()
and readBytes(), never trying to override readInt/VInt/Long/etc
methods.

Currently RAMDirectory uses a list of byte arrays as a backing store,
and I got some speedup when switched to custom version that knows each
file size beforehand and thus is able to allocate a single byte array
(deliberately accepting 2Gb file size limitation) of exactly needed
length. Nothing strange here, readByte(s) methods are easily most oft
called ones in a Lucene app and they were greatly simplified -
readByte became mere:
public byte readByte() throws IOException {
    return buffer[position++]; // I dropped bounds checking, relying
on natural ArrayIndexOOBE, we can't easily catch and recover from it
anyway
}

But now, readInt is four readByte calls, readLong is two readInts (ten
calls in total), readString - god knows how many. Unless you use a
single type of Directory through the lifetime of your application,
these readByte calls are never inlined, JIT invokevirtual
short-circuit optimization (it skips method lookup if it always finds
the same one during this exact invocation) cannot be applied too.

There are three cases when we can override readNNN methods and provide
implementations with zero or minimum method invocations -
RAMDirectory, MMapDirectory and BufferedIndexInput for
FSDirectory/CompoundFileReader. Anybody tried this?


-- 
Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Possible IndexInput optimization

Posted by Earwin Burrfoot <ea...@gmail.com>.

>> In my case I have to switch to MMap/Buffers, Java behaves ugly with
>> 8Gb heaps.
> Do you mean that because garbage collection does not perform well
> on these larger heaps, one should avoid to create arrays to have heaps
> of that size, and rather use (direct) MMap/Buffers?
Yes, exactly. Keeping big Directories in heap is painful in many ways:
1. Old-gen GC is slow on big heaps. Our 3Gb heaps were collected for
6-8 seconds with parallel collector on four-way machines. Concurrent
collector consistently core dumps, whatever the settings :) Then we
tried increasing heaps (upto 8Gb) in pursuit of less machines in
cluster, and it just collected for eternity.
2. Eden-survivor-old chain is showering sparks around when you feed it
with huge arrays created in numbers. So your New-gen GCs are still
swift (100-200ms), but happen too often. As a consequence some of
short-lived objects start leaking into Old-gen.
3. You have to reserve place for merges. Fully optimizing index is
very taxing, I cheat by stopping accepting outside requests, switching
off memory cache, optimizing, then putting everything back in place.

I'm currently testing mmap approach, and despite Sun's braindead API,
it works like a charm.

While I'm at it, I got two more questions about MMapDirectory.
How often openInput() is called for a file? Is it worthy to do
getChannel().map() when file is written and closed, and then clone the
buffer for each openInput()?
Why don't you force() a newly-mapped Buffer? It will save first few
searches hitting a new segment from pagefaults and waiting for that
segment to be loaded.


-- 
Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Possible IndexInput optimization

Posted by Paul Elschot <pa...@xs4all.nl>.

On Sunday 29 March 2009 13:47:59 Earwin Burrfoot wrote:
> > Earwin,
> > I did not experiment lately, but I'd like to add a general compressed
> > integer array to the basic types in an index, that would be compressed
> > on writing and decompressed on reading.
> > A first attempt is at LUCENE-1410, and one of the choices I had there
> > was whether or not to use NIO buffer methods on the index side.
> > I started there using these NIO buffer methods, but it seems that
> > the explicit byte arrays you're using here could be a good alternative.
> > I think my question boils down to whether or not these NIO buffers will
> > (in the end) get in the way of similar low level optimizations
> > you'd like to see applied here.
> > Regards,
> >
> > Paul Elschot
> In my case I have to switch to MMap/Buffers, Java behaves ugly with
> 8Gb heaps. 

Do you mean that because garbage collection does not perform well
on these larger heaps, one should avoid to create arrays to have heaps
of that size, and rather use (direct) MMap/Buffers?

> I'm thinking of trying to use Short/Int/LongBuffers that
> wrap my initial ByteBuffer.

So far I have used an IntBuffer wrapping a ByteBuffer at LUCENE-1410.
In case arrays are better not created for data to be read from index,
I'll keep it that way, hoping that that doesn't run into backward
compatibility problems.

Regards,
Paul Elschot

> 
> -- 
> Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
> Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
> ICQ: 104465785
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
> 
> 
> 
>

Re: Possible IndexInput optimization

Posted by Earwin Burrfoot <ea...@gmail.com>.

> Earwin,
> I did not experiment lately, but I'd like to add a general compressed
> integer array to the basic types in an index, that would be compressed
> on writing and decompressed on reading.
> A first attempt is at LUCENE-1410, and one of the choices I had there
> was whether or not to use NIO buffer methods on the index side.
> I started there using these NIO buffer methods, but it seems that
> the explicit byte arrays you're using here could be a good alternative.
> I think my question boils down to whether or not these NIO buffers will
> (in the end) get in the way of similar low level optimizations
> you'd like to see applied here.
> Regards,
>
> Paul Elschot
In my case I have to switch to MMap/Buffers, Java behaves ugly with
8Gb heaps. I'm thinking of trying to use Short/Int/LongBuffers that
wrap my initial ByteBuffer.

-- 
Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Possible IndexInput optimization

Posted by Paul Elschot <pa...@xs4all.nl>.

Earwin,

I did not experiment lately, but I'd like to add a general compressed
integer array to the basic types in an index, that would be compressed
on writing and decompressed on reading.

A first attempt is at LUCENE-1410, and one of the choices I had there
was whether or not to use NIO buffer methods on the index side.
I started there using these NIO buffer methods, but it seems that
the explicit byte arrays you're using here could be a good alternative.

I think my question boils down to whether or not these NIO buffers will
(in the end) get in the way of similar low level optimizations
you'd like to see applied here.

Regards,
Paul Elschot



On Sunday 29 March 2009 00:43:28 Earwin Burrfoot wrote:
> While drooling over MappedBigByteBuffer, which we'll (hopefully) see
> in JDK7, I revisited my own Directory code and noticed a certain
> peculiarity, shared by Lucene core classes:
> Each and every IndexInput implementation only implements readByte()
> and readBytes(), never trying to override readInt/VInt/Long/etc
> methods.
> 
> Currently RAMDirectory uses a list of byte arrays as a backing store,
> and I got some speedup when switched to custom version that knows each
> file size beforehand and thus is able to allocate a single byte array
> (deliberately accepting 2Gb file size limitation) of exactly needed
> length. Nothing strange here, readByte(s) methods are easily most oft
> called ones in a Lucene app and they were greatly simplified -
> readByte became mere:
> public byte readByte() throws IOException {
>     return buffer[position++]; // I dropped bounds checking, relying
> on natural ArrayIndexOOBE, we can't easily catch and recover from it
> anyway
> }
> 
> But now, readInt is four readByte calls, readLong is two readInts (ten
> calls in total), readString - god knows how many. Unless you use a
> single type of Directory through the lifetime of your application,
> these readByte calls are never inlined, JIT invokevirtual
> short-circuit optimization (it skips method lookup if it always finds
> the same one during this exact invocation) cannot be applied too.
> 
> There are three cases when we can override readNNN methods and provide
> implementations with zero or minimum method invocations -
> RAMDirectory, MMapDirectory and BufferedIndexInput for
> FSDirectory/CompoundFileReader. Anybody tried this?
> 
> 
> -- 
> Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
> Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
> ICQ: 104465785
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
> 
> 
> 
>

Re: Possible IndexInput optimization

Posted by Earwin Burrfoot <ea...@gmail.com>.

> A while ago I tried overriding the read* methods in BufferedIndexInput like
> this:
> ............................
> I'm still surprised there was no performance improvement at all. Maybe
> something was wrong with my test and I should try it again...

For BufferedIndexInput improvement should be noticeable only when the
file you're reading is loaded completely into OS disk cache (which was
not your case, I guess). Even then, you're making a syscall for each
1Kb chunk, it could probably dominate 1K method calls.
But for RAMDirectory/MMapDirectory you're not reading disk (if disk
cache kicked in), and you're not making syscalls. I guess I should
stop asking around and try.

-- 
Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Possible IndexInput optimization

Posted by Michael Busch <bu...@gmail.com>.

On 3/29/09 12:43 AM, Earwin Burrfoot wrote:
> There are three cases when we can override readNNN methods and provide
> implementations with zero or minimum method invocations -
> RAMDirectory, MMapDirectory and BufferedIndexInput for
> FSDirectory/CompoundFileReader. Anybody tried this?
>
>    

A while ago I tried overriding the read* methods in BufferedIndexInput 
like this:

     public int readVInt() throws IOException {
         if (5 <= (bufferLength-bufferPosition)) {
           return readVIntFast();
         }
         return super.readVInt();
     }

     private int readVIntFast() throws IOException {
         byte b = buffer[bufferPosition++];
         int i = b & 0x7F;
         for (int shift = 6; (b & 0x80) != 0; shift += 7) {
           b = buffer[bufferPosition++];
           i |= (b & 0x7F) << shift;
         }
         return i;
     }

Notice that I don't rely on ArrayIndexOutOfBoundsException, instead I do 
one range check in readVInt() and then call the readVIntFast() method, 
which accesses the buffer array directly to avoid multiple range checks.

Surprisingly I did not see any performance improvement. In my test I 
wrote a huge file (several GBs) to disk with VInts, making sure they 
occupied more than just a single byte each. Reading the file with and 
without this "optimization" in BufferedIndexInput made almost no 
difference. Only when I ran it in a profiler I saw a big difference, 
because with his change there are less method calls, hence less 
invocation count overhead.

I'm still surprised there was no performance improvement at all. Maybe 
something was wrong with my test and I should try it again...

-Michael

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org