You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Michael McCandless <lu...@mikemccandless.com> on 2023/08/17 10:43:48 UTC

WrongThreadException using the new Panama MMap on Java 19

Hi Team,

We hit an interesting and exciting intermittent exception in our
customer-facing product search instance (all Lucene!) at Amazon:

 java.lang.WrongThreadException: Attempted access outside owning thread

  at
java.base/jdk.internal.foreign.MemorySessionImpl.wrongThread(MemorySessionImpl.java:460)

  at
java.base/jdk.internal.misc.ScopedMemoryAccess$ScopedAccessError.newRuntimeException(ScopedMemoryAccess.java:113)

  at
java.base/jdk.internal.misc.ScopedMemoryAccess.getByte(ScopedMemoryAccess.java:518)

  at
java.base/java.lang.invoke.VarHandleSegmentAsBytes.get(VarHandleSegmentAsBytes.java:109)

  at java.base/java.lang.foreign.MemorySegment.get(MemorySegment.java:1103)

  at
org.apache.lucene.store.MemorySegmentIndexInput$SingleSegmentImpl.readByte(MemorySegmentIndexInput.java:485)

  at
org.apache.lucene.util.fst.ReverseRandomAccessReader.readByte(ReverseRandomAccessReader.java:33)

  at org.apache.lucene.util.fst.FST.findTargetArc(FST.java:1444)

  at
org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnum.seekExact(SegmentTermsEnum.java:511)

  at org.apache.lucene.index.TermStates.loadTermsEnum(TermStates.java:111)

  at org.apache.lucene.index.TermStates.build(TermStates.java:96)


We are using Corretto Java full version:


  openjdk full version "19.0.2+9"


Looking at how Uwe's magic mrjar code works, it doesn't look like we ever
make a thread private MemorySegment?  If so, I don't see how this exception
could be occurring :)  We seem to do this:

final MemorySession session = MemorySession.openShared();

Or, maybe we do sometimes make thread private memory segments, and maybe we
(Amazon's sources) have a silly thread over-sharing bug, but so far I think
that's unlikely -- we are calling TermStates.build from a single thread,
which under the hood clones/slices the MMap IndexInputs to seek the terms
dictionary on each segment and only that one thread ever interacts with
those.  It's all just one thread under TermStates.build.


This only happened on a few hosts and only for a short period of time,
making me suspect some sort of intermittent JVM bug (e.g. HotSpot
miscomiplation or so).  It is clearly very rare, so we are still using the
new MMap (which btw seems to be a big performance gain for our service,
which we are still trying to fully understand, more on that later!).


Has anyone else seen such errant exceptions with the new Panama based
MMap?  Are there any known Java issues that smell like this?  (A quick
search on bugs.openjdk.org (
https://bugs.openjdk.org/browse/JDK-8287809?jql=issuetype%20%3D%20Bug%20AND%20text%20~%20WrongThreadException)
did not seem to turn up any obvious candidates).


Thanks,


Mike McCandless

http://blog.mikemccandless.com

Re: WrongThreadException using the new Panama MMap on Java 19

Posted by Michael McCandless <lu...@mikemccandless.com>.
Thanks Uwe!  Responses inlined below:

On Thu, Aug 17, 2023 at 9:46 AM Uwe Schindler <uw...@thetaphi.de> wrote:

> this error indeed cannot happen as all our segments are shared. It could
> still be some bug in the Java 19 version, did you try Java 21 or Java 20?
>
OK, phew, that is what I thought (from reading Lucene's sources).  We are
testing Java 20 now (though EOL is in ~4 weeks -- hard to keep up!), and
soon Java 21.  Maybe we'll just go straight to Java 21 if there are no
problems.  This exception only happened on a single host (and JVM) across a
great many Lucene instances we run so whatever bug it is: it looks very
rare.  But, for that host, the exception happened a great many times, for
the few hours that this JVM was alive.

> It may also be a Coretto problem, maybe contact their team, maybe they
> have applied some changes. ScopedMemoryAccess is using an extension to the
> original Java memory model internally (I think the changed something in the
> specs), so it changed quite a lot internally. Maybe Coretto has some
> patches for hotspot that make the memory model changes hit us?
>
Good idea -- we will reach out to the Corretto team.  Scary to be using
extensions to JMM which was already complex enough to begin with!

> I don't think the bug is in Lucene's code, because if a thread is shared,
> it is shared. Maybe some other problem could be: Have you maybe
> accidentally closed the IndexInput too early. Normally this should cause an
> IllegalStateException (we have a test for this), but I am not fully sure
> what happens if the shared scope was already closed. I remmeber there were
> some bugs in 19, but it is already too long ago. So please try with plain
> OpenJDK Java 21 (or 20).
>
I don't think we are closing IndexInputs too early -- we would see many
many exceptions if so, I hope.  We will expedite getting off 19.

> I would like to know more about the speed improvements! In our
> benchmarking they were not so visible (only a slight change), so happy to
> see more.
>
Right, we were surprised too!  We are still trying to isolate where the
gains came from, but they were impactful for us (~5-7% reduction in CPU
time somehow).  We kept Java at 19 (we try not to upgrade both at the same
time!).

Mike McCandless

http://blog.mikemccandless.com

>

Re: WrongThreadException using the new Panama MMap on Java 19

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi,

this error indeed cannot happen as all our segments are shared. It could 
still be some bug in the Java 19 version, did you try Java 21 or Java 20?

It may also be a Coretto problem, maybe contact their team, maybe they 
have applied some changes. ScopedMemoryAccess is using an extension to 
the original Java memory model internally (I think the changed something 
in the specs), so it changed quite a lot internally. Maybe Coretto has 
some patches for hotspot that make the memory model changes hit us?

I don't think the bug is in Lucene's code, because if a thread is 
shared, it is shared. Maybe some other problem could be: Have you maybe 
accidentally closed the IndexInput too early. Normally this should cause 
an IllegalStateException (we have a test for this), but I am not fully 
sure what happens if the shared scope was already closed. I remmeber 
there were some bugs in 19, but it is already too long ago. So please 
try with plain OpenJDK Java 21 (or 20).

I would like to know more about the speed improvements! In our 
benchmarking they were not so visible (only a slight change), so happy 
to see more.

Uwe

Am 17.08.2023 um 12:43 schrieb Michael McCandless:
> Hi Team,
>
> We hit an interesting and exciting intermittent exception in our 
> customer-facing product search instance (all Lucene!) at Amazon:
>
>  java.lang.WrongThreadException: Attempted access outside owning thread
>
> at 
> java.base/jdk.internal.foreign.MemorySessionImpl.wrongThread(MemorySessionImpl.java:460)
>
> at 
> java.base/jdk.internal.misc.ScopedMemoryAccess$ScopedAccessError.newRuntimeException(ScopedMemoryAccess.java:113)
>
> at 
> java.base/jdk.internal.misc.ScopedMemoryAccess.getByte(ScopedMemoryAccess.java:518)
>
> at 
> java.base/java.lang.invoke.VarHandleSegmentAsBytes.get(VarHandleSegmentAsBytes.java:109)
>
> at java.base/java.lang.foreign.MemorySegment.get(MemorySegment.java:1103)
>
> at 
> org.apache.lucene.store.MemorySegmentIndexInput$SingleSegmentImpl.readByte(MemorySegmentIndexInput.java:485)
>
> at 
> org.apache.lucene.util.fst.ReverseRandomAccessReader.readByte(ReverseRandomAccessReader.java:33)
>
> at org.apache.lucene.util.fst.FST.findTargetArc(FST.java:1444)
>
> at 
> org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnum.seekExact(SegmentTermsEnum.java:511)
>
> at org.apache.lucene.index.TermStates.loadTermsEnum(TermStates.java:111)
>
> at org.apache.lucene.index.TermStates.build(TermStates.java:96)
>
>
> We are using Corretto Java full version:
>
>
> openjdk full version "19.0.2+9"
>
>
> Looking at how Uwe's magic mrjar code works, it doesn't look like we 
> ever make a thread private MemorySegment?  If so, I don't see how this 
> exception could be occurring :)  We seem to do this:
>
> |final MemorySession session = MemorySession.openShared();|
>
> Or, maybe we do sometimes make thread private memory segments, and 
> maybe we (Amazon's sources) have a silly thread over-sharing bug, but 
> so far I think that's unlikely -- we are calling TermStates.build from 
> a single thread, which under the hood clones/slices the MMap 
> IndexInputs to seek the terms dictionary on each segment and only that 
> one thread ever interacts with those.  It's all just one thread under 
> TermStates.build.
>
>
> This only happened on a few hosts and only for a short period of time, 
> making me suspect some sort of intermittent JVM bug (e.g. HotSpot 
> miscomiplation or so).  It is clearly very rare, so we are still using 
> the new MMap (which btw seems to be a big performance gain for our 
> service, which we are still trying to fully understand, more on that 
> later!).
>
>
> Has anyone else seen such errant exceptions with the new Panama based 
> MMap?  Are there any known Java issues that smell like this?  (A quick 
> search on bugs.openjdk.org <http://bugs.openjdk.org> 
> (https://bugs.openjdk.org/browse/JDK-8287809?jql=issuetype%20%3D%20Bug%20AND%20text%20~%20WrongThreadException) 
> did not seem to turn up any obvious candidates).
>
>
> Thanks,
>
>
> Mike McCandless
>
> http://blog.mikemccandless.com

-- 
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail:uwe@thetaphi.de