You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Mike Hearn <he...@vinumeris.com> on 2015/09/09 14:46:41 UTC
Re: Suggested fix for JDK-4724038 (Add unmap method to MappedByteBuffer)

Thanks for the contribution, Uwe.

So far I think I like Andrew's suggestion of a guard page the most.
Unmapping the guard page boils down to a kind of thread-local variable
without the actual cost of reading anything (in theory). So by
write-protecting the guard page and then unmapping the file, and letting
the GC clean up the guard page later, the same semantics as today are
preserved and there's no race.

I guess, although it's ugly, a system property could control whether the
NIO implementation returns an ordinary MappedByteBuffer or a new subclass,
the UnmappableMappedByteBuffer. HotSpot would then be responsible for
removing the overhead of the virtual calls, as normal. If a customer finds
that the guard page write is causing performance issues for them, they
could use the system property to get the hold behaviour back and the unmap
call would throw.

But it sounds like users with extreme VMM needs, like Lucene, would find
this a performance win rather than a loss.

I admit that I'm not a JDK dev. Writing such a patch would be possible for
me but I don't have any kind of performance testing rigs, and this tweak
seems to be mostly dominated by performance concerns. Also I'm kind of busy
with other things right now.

On Wed, Sep 9, 2015 at 12:51 PM, Uwe Schindler <us...@apache.org>
wrote:

> Hi,
>
> Dawid Weiss and I are both involved in the Apache Lucene project and we
> know the problems with MappedByteBuffer and unmapping. Dawid already
> responded with a source code link to our impl (which needs to use the hacky
> cleaner() approach; also look at the heavy documentation in this class):
> https://github.com/apache/lucene-solr/blob/trunk/lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java
>
> So we would be very happy to get this issue resolved! The cleaner() hack
> is enabled by default in Lucene if the JVM supports it (so we won't break
> if JIGSAW prevents this, but our *large* users would heavily complain).
>
> >> This is fundamentally about *integrity* of the runtime. It follows there
> >> are security implications, but it’s still fundamentally an integrity
> issue
> >> and guarding an unsafe operation with a Security Manager is
> >> unfortunately an insufficient solution.
> >
> > Right, and just to add that there has been many attempts over the years
> > to find solutions to this issue. I think the closest was atomimcally
> > remapping but that wasn't feasible on all platforms and also didn't free
> > up the address space in a timely manner.
>
> So we should really find a solution here. I was talking with several
> people on various conferences (Rory O'Donnel or Mark Reinhold) and we had
> some ideas how to solve this. My idea how to solve this is explained below
> (I am not a JVM internals or Hotspot guy, so excuse some obviously "wrong"
> assumptions):
>
> Actually there are 2 issues, not only one. The first issue is, as
> mentioned before: you cannot unmap via API. This is needed for many apps,
> including Apache Lucene, for a reason which comes more from "another" bug,
> and this is my issue #2 (see below).
>
> First, unmapping for Lucene is very important at the moment, because we
> operate on the Lucene indexes purely using mmap (see [1]), which may be
> several hundreds of Gigabytes easily. On highly dynamic systems, Lucene
> often maps new files (also very largeones ) and relies on the fact, that
> older, deleted files are unmapped in time (this does not need to be ASAP,
> just "in time"). So we have those 2 "bugs", which force us to unmap:
>
> (1) disk space issues / delete after last close (POSIX) vs. No delete at
> all (Windows)
>
> - disk space: we have seen customers running out of disk space on Lucene,
> because unmapping wasn’t done in time and therefore POSIX with delete on
> last close cannot free the disk space, although the file was already
> deleted. The problem you are seeing on Windows that you cannot delete, is
> therefore worse on Linux, because it is hidden to the user - you cannot
> free the disk space of the deleted file! Lucene creates and deletes files
> all the time while indexing realtime data (e.g. think of Github's very
> dynamic code search index, which is backed by Lucene/Elasticsearch).
> - virtual memory: If you map huge files (several hundreds of Gigabytes)
> and they are not unmapped in time, you may run out of virtual address
> space. This especially affects Windows, because it does not use the full 46
> bits (or like that) of addresses. So effectively you can only map like 4
> Terabytes on Windows. If you have fragmentation of address space this gets
> worse (In Lucene, we map in chunks of 1 GiB because of the signed 32 bit
> integer limit of ByteBuffer, so fragmentation is not our biggest issue).
>
> (2) It takes veeeeeeeeeeeeeeeery long time until the unmapping actually
> occurs!
>
> This is the real bug! If the garbage collector would clean up the buffers
> asap, we would not need to unmap from user code. In Lucene we just delay
> the file delete on Windows, so we are not really affected by the file
> deletion inability (but that would be nice if it could be fixed).
>
> If you look at the usage pattern of those huge, mapped files, you will see
> why they are in most cases *never ever* unmapped automatically: Lucene maps
> very large files and uses them for longer time. So the MappedByteBuffer
> object gets migrated to older generations on the heap. Garbage collection
> there happens, of course, very delayed. That would not be the most
> problematic part, but there is a second issue: The MappedByteBuffer object
> is just a very small object (in heap size measurement: just an object
> header and a few pointers), so the garbage collector does not see it as
> heavy! It's just a very small like 30 bytes object instance. Why should the
> Garbage collector clean it up? And in fact it will almost never do this!
> The garbage collector cannot see that our 30 bytes object instance "sits"
> on something like 300 Gigabytes of virtual memory and disk space!
>
> One proposal to fix this would be to add something like an internal
> OpenJDK Java Annotation or similar where you can "mark" heavy objects, so
> Garbage collector could free them by preference (similar to
> sun.misc.Contended).
>
> For the Apache Lucene team,
> Uwe
>
> [1]
> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>
> -----
> Uwe Schindler
> uschindler@apache.org
> ASF Member, Apache Lucene PMC / Committer
> Bremen, Germany
> http://lucene.apache.org/
>
>
>