You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/05/20 18:06:44 UTC

[GitHub] [lucene] uschindler opened a new pull request, #912: Initial rewrite of MMapDirectory for JDK-19 preview Panama APIs (>= JDK-19-ea+23)

uschindler opened a new pull request, #912:
URL: https://github.com/apache/lucene/pull/912

   **INFO: This is a followup of #518: It's the same code base, but with API changes from JDK 19 applied**
   
   This is just a draft PR for a first insight on memory mapping improvements in JDK 19+.
   
   Some background information: Starting with JDK-14, there is a new incubating module "jdk.incubator.foreign" that has a new, not yet stable API for accessing off-heap memory (and later it will also support calling functions using classical MethodHandles that are located in libraries like .so or .dll files). This incubator module has several versions:
   - first version: https://openjdk.java.net/jeps/370 (slow, very buggy and thread confinement, so making it unuseable with Lucene)
   - second version: https://openjdk.java.net/jeps/383 (still thread confinement, but now allows transfer of "ownership" to other threads; this is still impossible to use with Lucene.
   - third version in JDK 16: https://openjdk.java.net/jeps/393 (this version has included "Support for shared segments"). This now allows us to safely use the same external mmaped memory from different threads and also unmap it! This was implemented in the previous pull request #173
   - fourth version in JDK 17: https://openjdk.java.net/jeps/412 . This mainly changes the API around the scopes. Instead of having segments explicitely made "shared", we can assign them to some resource scope which control their behaviour. The resourceScope is produced one time for each IndexInput instance (not clones) and owns all segments. When the resourceScope is closed, all segments get invalid - and we throw `AlreadyClosedException`. The big problem is slowness due to heavy use of new instances just to copy memory between segments and java heap. This drives garbage collector crazy. This was implemented in previous PR #177 
   - fifth version in JDK 18, included in build 26: https://openjdk.java.net/jeps/419. This mainly cleans up the API. From Lucene's persepctive the `MemorySegment` API now has `System.arraycopy()`-like APIs to copy memory between heap and memory segments. This improves speed. It also handles byte-swapping automatically. This version of the PR also uses `ValueLayout` instead of varhandles, as it makes code more readable and type-safe.
   - sixth version in JDK 19, included with build 23: https://openjdk.java.net/jeps/424 (actual version). This version moves the whole incubation API as a stable "preview API". Some classes were renamed and bugs (e.g. on Windows with huge mappings) were resolved.
   
   This new preview API more or less overcomes several problems:
   - ByteBuffer API is limited to 32bit (in fact MMapDirectory has to chunk in 1 GiB portions)
   - There is no official way to unmap ByteBuffers when the file is no longer used. There is a way to use `sun.misc.Unsafe` and forcefully unmap segments, but any IndexInput accessing the file from another thread will crush the JVM with SIGSEGV or SIGBUS. We learned to live with that and we happily apply the unsafe unmapping, but that's the main issue.
   
   @uschindler had many discussions with the team at OpenJDK and finally with the third incubator, we have an API that works with Lucene. It was very fruitful discussions (thanks to @mcimadamore !)
   
   With the third incubator we are now finally able to do some tests (especially performance). With JDK 19, we can do testing the following way by tweaking the build system a bit:
   - disable `-Werror` everywhere
   - upgrade minimum and release Java version to 19 (this breaks linting with ECJ, so precommit won't work, but running tests does)
   - add `--enable-preview` parameter for all modules and test runner
   
   The code basically just modifies `MMapDirectory` to use LONG instead of INT for the chunk size parameter. In addition it adds `MemorySegmentIndexInput` that is a copy of our `ByteBufferIndexInput` (still there, but unused), but using MemorySegment instead of ByteBuffer behind the scenes. It works in exactly the same way, just the try/catch blocks for supporting EOFException or moving to another segment were rewritten.
   
   It passes all tests and it looks like you can use it to read indexes. The default chunk size is now 16 GiB (but you can raise or lower it as you like; tests are doing this). Of course you can set it to Long.MAX_VALUE, in that case every index file is always mapped to one big memory mapping. My testing with Windows 10 have shown, that this is *not a good idea!!!*. Huge mappings fragment address space over time and as we can only use like 43 or 46 bits (depending on OS), the fragmentation will at some point kill you. So 16 GiB looks like a good compromise: Most files will be smaller than 6 GiB anyways (unless you optimize your index to one huge segment). So for most Lucene installations, the number of segments will equal the number of open files, so Elasticsearch huge user consumers will be very happy. The sysctl max_map_count may not need to be touched anymore.
   
   In addition, this implements `readLongs` in a better way than @jpountz did (no caching or arbitrary objects). The new foreign-vector APIs will in future also be written with MemorySegment in its focus. So you can allocate a vector view on a MemorySegment and let the vectorizer fully work outside java heap inside our mmapped files! :-)_
   
   It would be good if you could checkout this branch and try it in production.
   
   According to speed tests it should be as fast as MMAPDirectory, partially also faster because less switching between byte-buffers is needed. With recent optimizations also `long`-based absolute access in loops should be faster.
   
   But be aware:
   - You need JDK 17 to run Gradle (set `JAVA_HOME` to it)
   - You need JDK-19-ea+23 (set `RUNTIME_JAVA_HOME` to it)
   - All JAR files will be in Java 19 format and require  JDK-19 to execute.
   - Also you need to add `--enable-preview` to the command line of your Java program/Solr server/Elasticsearch server
   
   It would be good to get some benchmarks, especially by @rmuir or @mikemccand. _Take your time and enjoy the complexity of setting this up!_ ;-)
   
   My plan is the following:
   - report any bugs or slowness, especially with Hotspot optimizations.
   - Add a self-standing/separate JDK-19 compiled module as external JAR. This can be added to classpath or module-path and be used by Elasticsearch or Solr. I will work on a Lucene-external project to do this. This solves the problems with all class files need to be compiled against Java 19.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] uschindler commented on pull request #912: MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23)

Posted by GitBox <gi...@apache.org>.
uschindler commented on PR #912:
URL: https://github.com/apache/lucene/pull/912#issuecomment-1140807127

   Thanks @mocobeta,
   I just need to not forget to reenable in September when this gets merged (the we van also use Autoprovision of JDK 19).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] uschindler commented on pull request #912: MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23)

Posted by GitBox <gi...@apache.org>.
uschindler commented on PR #912:
URL: https://github.com/apache/lucene/pull/912#issuecomment-1314125432

   Maybe at some point JDK allows to call `Checksum#update(MemorySegment,...)`. Then we don't need to copy multiple times, but the checksum will be calculated using the off-heap slice.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] uschindler commented on pull request #912: MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23)

Posted by GitBox <gi...@apache.org>.
uschindler commented on PR #912:
URL: https://github.com/apache/lucene/pull/912#issuecomment-1291795040

   Hi @mcimadamore 
   
   > @uschindler Just curious - did you get any feedback re. performance? Thanks
   
   No official feedback yet. But Elasticsearch and Opensearch seem to benchmark. Unfortunately end-users have not yet used it on their side, because no release of Elasticserach/Opensearch using Lucene 9.4 was done yet. So we may only get feedback from pure Lucene users and there are not many.
   
   For Apache Solr, the latest release was also without Lucene 9.4. In former times this ws easier, because at least Lucene and Solr were released together, but that's no longer the case. So we have to wait.
   
   I will ask @mikemccand to at least enable `--enable-preview` on the nightly pure Lucene benchmark by default (and use JDK 19).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] uschindler commented on a diff in pull request #912: MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23)

Posted by GitBox <gi...@apache.org>.
uschindler commented on code in PR #912:
URL: https://github.com/apache/lucene/pull/912#discussion_r979990105


##########
gradle/generation/local-settings.gradle:
##########
@@ -98,12 +98,12 @@ org.gradle.workers.max=${maxWorkers}
 # Maximum number of test JVMs forked per test task.
 tests.jvms=${testsJvms}
 
-# Disable auto JVM provisioning (we don't use toolchains yet but want no surprises).
-org.gradle.java.installations.auto-download=false
+# Enable auto JVM provisioning.
+org.gradle.java.installations.auto-download=true
 
 # Set these to enable automatic JVM location discovery.
-org.gradle.java.installations.fromEnv=JDK11,JDK12,JDK13,JDK14,JDK15,JDK16,JDK17
-org.gradle.java.installations.paths=(custom paths)
+org.gradle.java.installations.fromEnv=JDK17,JDK19,JAVA19_HOME

Review Comment:
   Done. BTW, Policeman Jenkins was doing exactly this before the auto-provisioning worked. The JAVA19_HOME env var is currently the only one used. JAVA17_HOME was just added for consistency.
   
   Could you still test this on your Mac? Github jobs already did this and all worked. It will automatically download like any other JAR file from Maven Central. To get rid of it and for testing do `rm -rf ~/.gradle/jdks/*`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] dweiss commented on pull request #912: MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23)

Posted by GitBox <gi...@apache.org>.
dweiss commented on PR #912:
URL: https://github.com/apache/lucene/pull/912#issuecomment-1140717409

   Thanks for the explanation, Uwe. In the meantime I recalled what it was about toolchains that was a showstopper - indeed it was the possibility to specify a toolchain pointing at a particular (local) installation - such as a custom-compiled JDK. I couldn't find a way to do it. If you take a look at alternative-jdk-support, it even has a comment about it:
   ```
   // I failed to set it up leveraging Gradle's toolchains because
   // a toolchain spec is not flexible enough to provide an exact location of the JVM to be used;
   // if you have two identical JVM lang. versions in auto-discovered JVMs, an arbitrary one is used (?).
   // This situation is not uncommon when debugging low-level stuff (hand-compiled JVM binaries).
   ```
   
   Eh.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] mocobeta commented on a diff in pull request #912: Initial rewrite of MMapDirectory for JDK-19 preview Panama APIs (>= JDK-19-ea+23)

Posted by GitBox <gi...@apache.org>.
mocobeta commented on code in PR #912:
URL: https://github.com/apache/lucene/pull/912#discussion_r878666674


##########
lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java:
##########
@@ -232,55 +224,62 @@ public IndexInput openInput(String name, IOContext context) throws IOException {
     ensureOpen();
     ensureCanRead(name);
     Path path = directory.resolve(name);
-    try (FileChannel c = FileChannel.open(path, StandardOpenOption.READ)) {
-      final String resourceDescription = "MMapIndexInput(path=\"" + path.toString() + "\")";
-      final boolean useUnmap = getUseUnmap();
-      return ByteBufferIndexInput.newInstance(
-          resourceDescription,
-          map(resourceDescription, c, 0, c.size()),
-          c.size(),
-          chunkSizePower,
-          new ByteBufferGuard(resourceDescription, useUnmap ? CLEANER : null));
+    final String resourceDescription = "MemorySegmentIndexInput(path=\"" + path.toString() + "\")";
+
+    // Work around for JDK-8259028: we need to unwrap our test-only file system layers
+    path = Unwrappable.unwrapAll(path);
+
+    boolean success = false;
+    final MemorySession session = MemorySession.openShared();
+    try (var fc = FileChannel.open(path)) {
+      final long fileSize = fc.size();
+      final MemorySegment[] segments = map(session, resourceDescription, fc, fileSize);
+      final IndexInput in =
+          MemorySegmentIndexInput.newInstance(
+              resourceDescription, session, segments, fileSize, chunkSizePower);
+      success = true;
+      return in;
+    } finally {
+      if (success == false) {
+        session.close();
+      }
     }
   }
 
-  /** Maps a file into a set of buffers */
-  final ByteBuffer[] map(String resourceDescription, FileChannel fc, long offset, long length)
+  /** Maps a file into a set of segments */
+  final MemorySegment[] map(
+      MemorySession session, String resourceDescription, FileChannel fc, long length)
       throws IOException {
     if ((length >>> chunkSizePower) >= Integer.MAX_VALUE)
-      throw new IllegalArgumentException(
-          "RandomAccessFile too big for chunk size: " + resourceDescription);
+      throw new IllegalArgumentException("File too big for chunk size: " + resourceDescription);
 
     final long chunkSize = 1L << chunkSizePower;
 
-    // we always allocate one more buffer, the last one may be a 0 byte one
-    final int nrBuffers = (int) (length >>> chunkSizePower) + 1;
+    // we always allocate one more segments, the last one may be a 0 byte one
+    final int nrSegments = (int) (length >>> chunkSizePower) + 1;
 
-    ByteBuffer[] buffers = new ByteBuffer[nrBuffers];
+    final MemorySegment[] segments = new MemorySegment[nrSegments];
 
-    long bufferStart = 0L;
-    for (int bufNr = 0; bufNr < nrBuffers; bufNr++) {
-      int bufSize =
-          (int) ((length > (bufferStart + chunkSize)) ? chunkSize : (length - bufferStart));
-      MappedByteBuffer buffer;
+    long startOffset = 0L;
+    for (int segNr = 0; segNr < nrSegments; segNr++) {
+      long segSize = (length > (startOffset + chunkSize)) ? chunkSize : (length - startOffset);

Review Comment:
   I'm resolving this - it's a local variable in the loop with a smaller scope, I just noticed that it looks like that other parts of this patch try to use final as far as possible.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] sherman commented on pull request #912: MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23)

Posted by GitBox <gi...@apache.org>.
sherman commented on PR #912:
URL: https://github.com/apache/lucene/pull/912#issuecomment-1324918186

   Hi, @uschindler!
   
   I'm experimenting with various improvements of I/O in Lucene.
   
   For a base-line, I use the old implementation of MMapDirectory.
   
   The first idea, when I see this code, is to speed up it with a good old Unsafe.
   I did it and got 7-15% in my final queries (we have a proprietary implementation of a search engine on top of Apache Lucene). Comparing to ByteBuffer implementation, it's clear, why the unsafe implementation is faster (don't have range checks, etc).
   
   The second difference, probably you don't need to have a multiple buffers because you can mmap the entire file at once (on 64 bit systems).
   
   Then, I saw the new code on top of a new foreign memory API. I use JDK 17 and the API is a little changed since.
    So, I reimplemented the directory. But I was surprised by the performance regression I got.
   
   My implementation is 1.5-2 slower than ByteBuffer/Unsafe implementation.
   
   May be the problem in performance regressions in JDK 17 (might be fixed in JDK 19 already), or incorrect usage of API (varhandlers are very cranky in terms of C2 opt/inlining).
   
   Before digging into the problem, I'd like to know, do you have any isolated benchmarks of your new implementation and if yes, how it could be compared to the previous implementation?
   
   How readByte() method looks like in my impl.
   ```java
   @Override
   public byte readByte() throws IOException {
     var value = MemoryAccess.getByteAtOffset(memorySegment, position);
     position += Byte.BYTES;
     return value;
   }
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] mcimadamore commented on pull request #912: MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23)

Posted by GitBox <gi...@apache.org>.
mcimadamore commented on PR #912:
URL: https://github.com/apache/lucene/pull/912#issuecomment-1325104378

   > More investigation is required to understand where the difference comes from.
   
   After playing a bit more with the benchmark, it looks like 1ns is coming from the various alignment checks we need to perform - whereas ~0.5ns is coming from the segment liveness check. Of course, since this is a single call, the cost for these checks cannot be amortized. We can probably improve things a little (esp. when there's no alignment involved), but the numbers above reflect the fact that the memory segment implementation has more checks, so at the nanosecond scale, everything matters of course.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] uschindler commented on pull request #912: MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23)

Posted by GitBox <gi...@apache.org>.
uschindler commented on PR #912:
URL: https://github.com/apache/lucene/pull/912#issuecomment-1317634632

   Code similar like this: https://github.com/openjdk/jdk/blob/37848a9ca2ab3021e7b3b2e112bab4631fbe1d99/src/java.base/share/classes/java/nio/X-Buffer.java.template#L929
   
   The of statement uses a simple copy loop to read the values if length is too short.
   
   We can do the same in our code by calling super to read into array.
   
   It would be worth a try.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] dweiss commented on pull request #912: MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23)

Posted by GitBox <gi...@apache.org>.
dweiss commented on PR #912:
URL: https://github.com/apache/lucene/pull/912#issuecomment-1140517332

   Hi Uwe. I recall I had problems with toolchains - some things just refused to work (or we couldn't customize them? Can't remember precisely what the problem was). I looked at the code you wrote and it seems all right at a glance. Some things will be awkward, like "rewinding" the javaHome set in other places... perhaps it'd be good to not set it there at all (move the toolchain declaration and runtime java selection into a single script) - can't say what will turn out to be better.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] uschindler commented on pull request #912: MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23)

Posted by GitBox <gi...@apache.org>.
uschindler commented on PR #912:
URL: https://github.com/apache/lucene/pull/912#issuecomment-1317618039

   Could be. It may be caused by too much coping of very small arrays (especially float and long) between offheap and heap. For explanation see above.
   
   I don't think this is a huge problem because it mainly comes from missing optimizations in the JVM and not our code. Could look different in Java 20. My only guess where it might come from: The overhead of copyMemory is huge for small arrays. ByteBuffer internally loops for very short arrays. We could try to call super.readLongs() if array is short (same for floats). But before doing this I would like to figure out what exactly slows.
   
   If I could get a profile only from this test or somebody could explain what mmap dir actions are used mainly in those facets.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on pull request #912: MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23)

Posted by GitBox <gi...@apache.org>.
rmuir commented on PR #912:
URL: https://github.com/apache/lucene/pull/912#issuecomment-1314053561

   one issue with "fancy" copies (e.g. for bulk merging) is that we can't make them super-fancy anyway (e.g. zero-copy from fd to fd) as we still have to calculate CRC32 checksum.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] mcimadamore commented on pull request #912: MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23)

Posted by GitBox <gi...@apache.org>.
mcimadamore commented on PR #912:
URL: https://github.com/apache/lucene/pull/912#issuecomment-1324996036

   > Code similar like this: https://github.com/openjdk/jdk/blob/37848a9ca2ab3021e7b3b2e112bab4631fbe1d99/src/java.base/share/classes/java/nio/X-Buffer.java.template#L929
   > 
   > The of statement uses a simple copy loop to read the values if length is too short.
   > 
   > We can do the same in our code by calling super to read into array.
   > 
   > It would be worth a try.
   
   @uschindler AFAIK, the limit in the buffer implementation is very low (it is set to 6 bytes). This means that just copying two int values (8 bytes) will overflow, and fallback to a bulk copy. So, unless you have a lot of copy operations with 0-1 elements, I doubt that this is really the culprit. More scientific benchmark below:
   
   ```
   Benchmark                      (ELEM_SIZE)  Mode  Cnt   Score   Error  Units
   SegmentCopy.buffer_copy                  5  avgt   30   3.952 ± 0.030  ns/op
   SegmentCopy.buffer_copy                 10  avgt   30   3.838 ± 0.053  ns/op
   SegmentCopy.buffer_copy                 50  avgt   30   5.069 ± 0.054  ns/op
   SegmentCopy.buffer_copy                100  avgt   30   8.157 ± 0.182  ns/op
   SegmentCopy.buffer_copy                500  avgt   30  21.651 ± 3.558  ns/op
   SegmentCopy.buffer_copy               1000  avgt   30  52.112 ± 9.626  ns/op
   SegmentCopy.segment_copy                 5  avgt   30   6.002 ± 0.070  ns/op
   SegmentCopy.segment_copy                10  avgt   30   5.649 ± 0.062  ns/op
   SegmentCopy.segment_copy                50  avgt   30   7.307 ± 0.420  ns/op
   SegmentCopy.segment_copy               100  avgt   30   8.894 ± 0.054  ns/op
   SegmentCopy.segment_copy               500  avgt   30  18.369 ± 0.449  ns/op
   SegmentCopy.segment_copy              1000  avgt   30  53.127 ± 9.788  ns/op
   SegmentCopy.segment_copy_loop            5  avgt   30   7.369 ± 0.086  ns/op
   SegmentCopy.segment_copy_loop           10  avgt   30   8.651 ± 0.106  ns/op
   SegmentCopy.segment_copy_loop           50  avgt   30  11.562 ± 0.133  ns/op
   SegmentCopy.segment_copy_loop          100  avgt   30  15.543 ± 0.315  ns/op
   SegmentCopy.segment_copy_loop          500  avgt   30  35.992 ± 0.608  ns/op
   SegmentCopy.segment_copy_loop         1000  avgt   30  65.584 ± 1.539  ns/op
   
   ```
   
   The performance is comparable, but on small sizes, the segment version seems to be worse off. This has nothing to do with the fact that we use a bulk copy (in fact, as demonstrated in the `segment_copy_loop` benchmark, not using a bulk copy is slower for all sizes considered - which is consistent with what I'm seeing in the ByteBuffer code). The segment copy code seems to have a fixed 2ns cost over the buffer variant. More investigation is required to understand where the difference comes from.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] uschindler commented on pull request #912: MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23)

Posted by GitBox <gi...@apache.org>.
uschindler commented on PR #912:
URL: https://github.com/apache/lucene/pull/912#issuecomment-1253646592

   Current output:
   
   ```
   Starting a Gradle Daemon (subsequent builds will be faster)
   Directory 'C:\Users\Uwe Schindler\.gradle\daemon\7.3.3\(custom paths)' (system property 'org.gradle.java.installations.paths') used for java installations does not exist
   
   > Task :errorProneSkipped
   WARNING: errorprone disabled (skipped on builds not running inside CI environments, pass -Pvalidation.errorprone=true to enable)
   
   > Task :lucene:core:compileMain19Java FAILED
   
   FAILURE: Build failed with an exception.
   
   * What went wrong:
   Execution failed for task ':lucene:core:compileMain19Java'.
   > Error while evaluating property 'javaCompiler' of task ':lucene:core:compileMain19Java'
      > Failed to calculate the value of task ':lucene:core:compileMain19Java' property 'javaCompiler'.
         > Unable to download toolchain matching these requirements: {languageVersion=19, vendor=ADOPTOPENJDK, implementation=vendor-specific}
            > Unable to download toolchain. This might indicate that the combination (version, architecture, release/early access, ...) for the requested JDK is not available.
               > Could not read 'https://api.adoptopenjdk.net/v3/binary/latest/19/ga/windows/x64/jdk/hotspot/normal/adoptopenjdk' as it does not exist.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] uschindler merged pull request #912: MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23)

Posted by GitBox <gi...@apache.org>.
uschindler merged PR #912:
URL: https://github.com/apache/lucene/pull/912


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] uschindler commented on pull request #912: MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23)

Posted by GitBox <gi...@apache.org>.
uschindler commented on PR #912:
URL: https://github.com/apache/lucene/pull/912#issuecomment-1140527102

   Hi Dawid,
   Thanks for feedback. Indeed the Tool chain API is very limited and in my honest opinion far from useful for Lucenes usages. There's no way to configure anything inside the Gradle files. Things like Autodownload or provider urls or locations of preinstalled jdks like on Jenkins cannot be put into the build files. It is also not possible to tell the system to download EA releases. It is all a desaster. I will tomorrow start a Twitter thread about this. 
   
   The tool chain API should be flexible and allow plugins to configure own repository types. In general the tool chains should be standard configurations and dependencies in Gradle. The wax how it is now is a dead end: inflexible, buggy and looks like written by a student as bachelor thesis in isolation.
   
   I would wifmsh you change a configuration "toolkit" like any other and add the dependency to the java version there. Compile tasks would then use it automatically. Plugins can implement repositories in same way like Maven or ivy. EMG. One plugin for adoptopenjdk/ eclipse and another one for EA releases.
   
   About your comments: o wanted to keep the tool chain JVM hacks at one place because they currently only affect Lucene Core. Once we have a release of  Java 19 we can simplify that (see code commented out). Other tasks than javac and jar are not affected. Javadics are not built and ecj/forbiddenapis/errorprone do not work at all, so I disabled them. It is just 2 classes to compile which are package private. If we get more MR-JAR stuff we may think of better ways. The current approach is a separate "singleton- in whole build. Maybe you can make it generic, but the combination with EA releases makes this impossible at moment.
   
   On top of that, preview APIs are another big problem. The Class files generated have a special minor version in class file format marking them as preview. This makes class loader picky. The code here won't run with Java 20, we will need a separate MR-JAR section for 20 and then for final release of panama in LTS 21 with unmarked class files. If the APIs will not change we may simplify this by removing the minor class version by patching class file version and only have the java 19 class files.
   
   You may read more about preview APIs in the jep, it's a mess to maintain: https://openjdk.java.net/jeps/12
   
   At moment I also tried this PR with Luke:
   - if you start Luke with Java 17 all works as before, it won't see new classes and loads the byte buffer impl
   - if you start with Java 19 it will find the new classes, but class loader will reject to load them. Lucene will print a warning (you see it in Luke's log window and console). It will use byte buffer impl then
   - if you start Luke with Java 19 and `--enable-preview`, it will use new impl. So this is still opt in, which is good with preview APIs.
   - if you start with a possible Java 20, with or without preview enabled, it will reject the class files and falls back to our old byte buffer impl. In that case it logs warning to update Lucene. 😉
   
   So this looks like working good and people can optionally test new mmap with Java 19 if they opt in (Solr or Elasticsearch).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] sherman commented on pull request #912: MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23)

Posted by GitBox <gi...@apache.org>.
sherman commented on PR #912:
URL: https://github.com/apache/lucene/pull/912#issuecomment-1324961786

   @mcimadamore Thank you for the detailed explanation. I've got the point. I need to upgrade to JDK 19.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] mikemccand commented on pull request #912: MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23)

Posted by GitBox <gi...@apache.org>.
mikemccand commented on PR #912:
URL: https://github.com/apache/lucene/pull/912#issuecomment-1291905742

   > I will ask @mikemccand to at least enable --enable-preview on the nightly pure Lucene benchmark by default (and use JDK 19).
   
   Ack -- I'll enable this starting from tonite's run.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] uschindler commented on pull request #912: MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23)

Posted by GitBox <gi...@apache.org>.
uschindler commented on PR #912:
URL: https://github.com/apache/lucene/pull/912#issuecomment-1292321865

   @mikemccand Maybe do this in a 2 step process:
   - First update to Java 19, so we get a clean state
   - Then add `--enable-preview`
   
   If we do both at same time, we won't see a difference between old and new Lucene MMAP (on same version). A JDK upgrade may also change other performance numbers.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] mcimadamore commented on pull request #912: MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23)

Posted by GitBox <gi...@apache.org>.
mcimadamore commented on PR #912:
URL: https://github.com/apache/lucene/pull/912#issuecomment-1324947263

   @sherman I believe there is a big difference in performance between Java 19 and Java 17. Java 19 is the first release of the API where loop optimizations and range check elimination are correctly implemented for loops over `long` induction variable. Up to Java 19, there used to be some black magic in the FFM API implementation to use a "fast path" when a segment was detected to have a size that would fit an integer. But this resulted in multiple code paths, and, sometimes, suboptimal optimizations.
   
   On top of that, IIRC, Java 17 is missing all the static `copy` methods (these were added in Java 18). These have been added precisely to address some performance concerns raised by the Lucene team (as copying data from heap arrays to off-heap memory would require, using the Java 17 API, creating intermediate memory segment instances).
   
   Finally, Java 18 also add instance dereference methods on the MemorySegment class. E.g. no more static methods on `MemoryAccess` (that class is now gone). Again, this provides some improvements, as C2 is better equipped to speculate on the receiver type of a memory segment.
   
   @uschindler can correct me if I'm wrong, but I believe that the numbers he got against Java 17 were not too great.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] uschindler commented on a diff in pull request #912: MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23)

Posted by GitBox <gi...@apache.org>.
uschindler commented on code in PR #912:
URL: https://github.com/apache/lucene/pull/912#discussion_r979984774


##########
gradle/generation/local-settings.gradle:
##########
@@ -98,12 +98,12 @@ org.gradle.workers.max=${maxWorkers}
 # Maximum number of test JVMs forked per test task.
 tests.jvms=${testsJvms}
 
-# Disable auto JVM provisioning (we don't use toolchains yet but want no surprises).
-org.gradle.java.installations.auto-download=false
+# Enable auto JVM provisioning.
+org.gradle.java.installations.auto-download=true
 
 # Set these to enable automatic JVM location discovery.
-org.gradle.java.installations.fromEnv=JDK11,JDK12,JDK13,JDK14,JDK15,JDK16,JDK17
-org.gradle.java.installations.paths=(custom paths)
+org.gradle.java.installations.fromEnv=JDK17,JDK19,JAVA19_HOME

Review Comment:
   yes, I wanted to change this, too.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] uschindler commented on a diff in pull request #912: MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23)

Posted by GitBox <gi...@apache.org>.
uschindler commented on code in PR #912:
URL: https://github.com/apache/lucene/pull/912#discussion_r979991502


##########
gradle/generation/local-settings.gradle:
##########
@@ -98,12 +98,12 @@ org.gradle.workers.max=${maxWorkers}
 # Maximum number of test JVMs forked per test task.
 tests.jvms=${testsJvms}
 
-# Disable auto JVM provisioning (we don't use toolchains yet but want no surprises).
-org.gradle.java.installations.auto-download=false
+# Enable auto JVM provisioning.
+org.gradle.java.installations.auto-download=true
 
 # Set these to enable automatic JVM location discovery.
-org.gradle.java.installations.fromEnv=JDK11,JDK12,JDK13,JDK14,JDK15,JDK16,JDK17
-org.gradle.java.installations.paths=(custom paths)
+org.gradle.java.installations.fromEnv=JDK17,JDK19,JAVA19_HOME

Review Comment:
   FYI, to actually run all tests, you still need to set RUNTIME_JAVA_HOME as before. The autoprovisioned JDK is only used for compiling the code and packaging the MR-JAR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] uschindler commented on pull request #912: MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23)

Posted by GitBox <gi...@apache.org>.
uschindler commented on PR #912:
URL: https://github.com/apache/lucene/pull/912#issuecomment-1311033936

   After Mike switched to preview mode the results look good. The speed with MemorySegmentIndexInput ist similar to old ByteBuffer code.
   
   https://home.apache.org/~mikemccand/lucenebench/
   
   See change EZ. Of course what we can't test is highly concurrent reopening of index and therefore closing shared MemorySessions at high ratio.
   
   CC @mcimadamore 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] mocobeta commented on a diff in pull request #912: Initial rewrite of MMapDirectory for JDK-19 preview Panama APIs (>= JDK-19-ea+23)

Posted by GitBox <gi...@apache.org>.
mocobeta commented on code in PR #912:
URL: https://github.com/apache/lucene/pull/912#discussion_r878663653


##########
lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java:
##########
@@ -232,55 +224,62 @@ public IndexInput openInput(String name, IOContext context) throws IOException {
     ensureOpen();
     ensureCanRead(name);
     Path path = directory.resolve(name);
-    try (FileChannel c = FileChannel.open(path, StandardOpenOption.READ)) {
-      final String resourceDescription = "MMapIndexInput(path=\"" + path.toString() + "\")";
-      final boolean useUnmap = getUseUnmap();
-      return ByteBufferIndexInput.newInstance(
-          resourceDescription,
-          map(resourceDescription, c, 0, c.size()),
-          c.size(),
-          chunkSizePower,
-          new ByteBufferGuard(resourceDescription, useUnmap ? CLEANER : null));
+    final String resourceDescription = "MemorySegmentIndexInput(path=\"" + path.toString() + "\")";
+
+    // Work around for JDK-8259028: we need to unwrap our test-only file system layers
+    path = Unwrappable.unwrapAll(path);
+
+    boolean success = false;
+    final MemorySession session = MemorySession.openShared();
+    try (var fc = FileChannel.open(path)) {
+      final long fileSize = fc.size();
+      final MemorySegment[] segments = map(session, resourceDescription, fc, fileSize);
+      final IndexInput in =
+          MemorySegmentIndexInput.newInstance(
+              resourceDescription, session, segments, fileSize, chunkSizePower);
+      success = true;
+      return in;
+    } finally {
+      if (success == false) {
+        session.close();
+      }
     }
   }
 
-  /** Maps a file into a set of buffers */
-  final ByteBuffer[] map(String resourceDescription, FileChannel fc, long offset, long length)
+  /** Maps a file into a set of segments */
+  final MemorySegment[] map(
+      MemorySession session, String resourceDescription, FileChannel fc, long length)
       throws IOException {
     if ((length >>> chunkSizePower) >= Integer.MAX_VALUE)
-      throw new IllegalArgumentException(
-          "RandomAccessFile too big for chunk size: " + resourceDescription);
+      throw new IllegalArgumentException("File too big for chunk size: " + resourceDescription);
 
     final long chunkSize = 1L << chunkSizePower;
 
-    // we always allocate one more buffer, the last one may be a 0 byte one
-    final int nrBuffers = (int) (length >>> chunkSizePower) + 1;
+    // we always allocate one more segments, the last one may be a 0 byte one
+    final int nrSegments = (int) (length >>> chunkSizePower) + 1;
 
-    ByteBuffer[] buffers = new ByteBuffer[nrBuffers];
+    final MemorySegment[] segments = new MemorySegment[nrSegments];
 
-    long bufferStart = 0L;
-    for (int bufNr = 0; bufNr < nrBuffers; bufNr++) {
-      int bufSize =
-          (int) ((length > (bufferStart + chunkSize)) ? chunkSize : (length - bufferStart));
-      MappedByteBuffer buffer;
+    long startOffset = 0L;
+    for (int segNr = 0; segNr < nrSegments; segNr++) {
+      long segSize = (length > (startOffset + chunkSize)) ? chunkSize : (length - startOffset);

Review Comment:
   thanks, I surely missed the description.
   
   > ByteBuffer API is limited to 32bit



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] mocobeta commented on a diff in pull request #912: Initial rewrite of MMapDirectory for JDK-19 preview Panama APIs (>= JDK-19-ea+23)

Posted by GitBox <gi...@apache.org>.
mocobeta commented on code in PR #912:
URL: https://github.com/apache/lucene/pull/912#discussion_r878635805


##########
lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java:
##########
@@ -232,55 +224,62 @@ public IndexInput openInput(String name, IOContext context) throws IOException {
     ensureOpen();
     ensureCanRead(name);
     Path path = directory.resolve(name);
-    try (FileChannel c = FileChannel.open(path, StandardOpenOption.READ)) {
-      final String resourceDescription = "MMapIndexInput(path=\"" + path.toString() + "\")";
-      final boolean useUnmap = getUseUnmap();
-      return ByteBufferIndexInput.newInstance(
-          resourceDescription,
-          map(resourceDescription, c, 0, c.size()),
-          c.size(),
-          chunkSizePower,
-          new ByteBufferGuard(resourceDescription, useUnmap ? CLEANER : null));
+    final String resourceDescription = "MemorySegmentIndexInput(path=\"" + path.toString() + "\")";
+
+    // Work around for JDK-8259028: we need to unwrap our test-only file system layers
+    path = Unwrappable.unwrapAll(path);
+
+    boolean success = false;
+    final MemorySession session = MemorySession.openShared();
+    try (var fc = FileChannel.open(path)) {
+      final long fileSize = fc.size();
+      final MemorySegment[] segments = map(session, resourceDescription, fc, fileSize);
+      final IndexInput in =
+          MemorySegmentIndexInput.newInstance(
+              resourceDescription, session, segments, fileSize, chunkSizePower);
+      success = true;
+      return in;
+    } finally {
+      if (success == false) {
+        session.close();
+      }
     }
   }
 
-  /** Maps a file into a set of buffers */
-  final ByteBuffer[] map(String resourceDescription, FileChannel fc, long offset, long length)
+  /** Maps a file into a set of segments */
+  final MemorySegment[] map(
+      MemorySession session, String resourceDescription, FileChannel fc, long length)
       throws IOException {
     if ((length >>> chunkSizePower) >= Integer.MAX_VALUE)
-      throw new IllegalArgumentException(
-          "RandomAccessFile too big for chunk size: " + resourceDescription);
+      throw new IllegalArgumentException("File too big for chunk size: " + resourceDescription);
 
     final long chunkSize = 1L << chunkSizePower;
 
-    // we always allocate one more buffer, the last one may be a 0 byte one
-    final int nrBuffers = (int) (length >>> chunkSizePower) + 1;
+    // we always allocate one more segments, the last one may be a 0 byte one
+    final int nrSegments = (int) (length >>> chunkSizePower) + 1;
 
-    ByteBuffer[] buffers = new ByteBuffer[nrBuffers];
+    final MemorySegment[] segments = new MemorySegment[nrSegments];
 
-    long bufferStart = 0L;
-    for (int bufNr = 0; bufNr < nrBuffers; bufNr++) {
-      int bufSize =
-          (int) ((length > (bufferStart + chunkSize)) ? chunkSize : (length - bufferStart));
-      MappedByteBuffer buffer;
+    long startOffset = 0L;
+    for (int segNr = 0; segNr < nrSegments; segNr++) {
+      long segSize = (length > (startOffset + chunkSize)) ? chunkSize : (length - startOffset);

Review Comment:
   minor: can be final?
   also, thanks for changing the type to `long` - I was wondering why the old `bufSize` should be int.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] uschindler commented on pull request #912: Initial rewrite of MMapDirectory for JDK-19 preview Panama APIs (>= JDK-19-ea+23)

Posted by GitBox <gi...@apache.org>.
uschindler commented on PR #912:
URL: https://github.com/apache/lucene/pull/912#issuecomment-1140344703

   I updated the PR to  use an MR-JAR (multi release JAR) to implement the new functionality. It also uses Gradle's toolchain support to automatically download the JDK 19 release just to compile the MR part. This is commented out at the moment, because the Gradle toolchain API does not support EA releases.
   
   The cool thing: Once OpenJDK 19 got out of the door in September, we can enable the toochanin support an then also backport this patch to Lucene 9.x.
   
   The MR-JAR part works the following way:
   - The `MappedByteBuffer` part and all the Java 9+ hacks with Unsafe are moved to a new pkg-private interface `MMapIndexInputProvider`.
   - In the MR-JAR part for Java 19 there's a separate implementation of `MMapIndexInputProvider`, compiled with `--enable-preview`
   - the main `MMapDirectory` has a static initializer that tries to load and link the Java19 `MemorySegmentIndexInputProvider`. It uses method handles for that because reflection would need `setAccessible()`. If loading fails it logs a warning if Java 19 is used, but `--enable-preview` is missing on Java's command line. In all cases it falls back to the default `MappedByteBufferIndexInputProvider` implementation in the main part of JAR file.
   
   I also added distribution test that checks if the built JAR file is a correct MR-JAR and it also works with module system.
   
   The current PR still requires Java 19 as `RUNTIME_JAVA_HOME` because of missing toolchain support. To test the "legacy" byte buffer implementation, one can comment the `--enable-preview` from the test's JVM args.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on pull request #912: MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23)

Posted by GitBox <gi...@apache.org>.
jpountz commented on PR #912:
URL: https://github.com/apache/lucene/pull/912#issuecomment-1313963721

   I haven't dug, but your reasoning could explain why we are seeing a similar speedup across both `BEST_SPEED` and `BEST_COMPRESSION` since they benefit from the bulk merging optimization in a similar way.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on a diff in pull request #912: MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23)

Posted by GitBox <gi...@apache.org>.
rmuir commented on code in PR #912:
URL: https://github.com/apache/lucene/pull/912#discussion_r979984394


##########
gradle/generation/local-settings.gradle:
##########
@@ -98,12 +98,12 @@ org.gradle.workers.max=${maxWorkers}
 # Maximum number of test JVMs forked per test task.
 tests.jvms=${testsJvms}
 
-# Disable auto JVM provisioning (we don't use toolchains yet but want no surprises).
-org.gradle.java.installations.auto-download=false
+# Enable auto JVM provisioning.
+org.gradle.java.installations.auto-download=true
 
 # Set these to enable automatic JVM location discovery.
-org.gradle.java.installations.fromEnv=JDK11,JDK12,JDK13,JDK14,JDK15,JDK16,JDK17
-org.gradle.java.installations.paths=(custom paths)
+org.gradle.java.installations.fromEnv=JDK17,JDK19,JAVA19_HOME

Review Comment:
   can we give these a consistent name, e.g. `JAVA17_HOME,JAVA19_HOME`? I assume I can set these so that gradle won't download stuff, i'd prefer to use my package manager.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] uschindler commented on a diff in pull request #912: MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23)

Posted by GitBox <gi...@apache.org>.
uschindler commented on code in PR #912:
URL: https://github.com/apache/lucene/pull/912#discussion_r979990105


##########
gradle/generation/local-settings.gradle:
##########
@@ -98,12 +98,12 @@ org.gradle.workers.max=${maxWorkers}
 # Maximum number of test JVMs forked per test task.
 tests.jvms=${testsJvms}
 
-# Disable auto JVM provisioning (we don't use toolchains yet but want no surprises).
-org.gradle.java.installations.auto-download=false
+# Enable auto JVM provisioning.
+org.gradle.java.installations.auto-download=true
 
 # Set these to enable automatic JVM location discovery.
-org.gradle.java.installations.fromEnv=JDK11,JDK12,JDK13,JDK14,JDK15,JDK16,JDK17
-org.gradle.java.installations.paths=(custom paths)
+org.gradle.java.installations.fromEnv=JDK17,JDK19,JAVA19_HOME

Review Comment:
   Done. BTW, Policeman Jenkins was doing exactly this before the auto-provisioning worked. The JAVA19_HOME env var is currently the only one used. JAVA17_HOME was just added for consistency.
   
   Could you still test this. It will automatically download like any other JAR file from Maven Central. To get rid of it and for testing do `rm -rf ~/.gradle/jdks/*`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] uschindler commented on pull request #912: MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23)

Posted by GitBox <gi...@apache.org>.
uschindler commented on PR #912:
URL: https://github.com/apache/lucene/pull/912#issuecomment-1140447638

   It is now possible to compile and run tests with different JVMs:
   
   If `RUNTIME_JAVA_HOME` does not point to *exactly* Java 19, it won't compile and give error message. This is needed until we have toolchain support.
   
   To compile, you may pass `JAVA19_HOME` (this is what Jenkins does). You can then still randomize and run the tests against Java 17 (default, no `RUNTIME_JAVA_HOME`) or any other Java version. This *must* be a Java 19 version (exact, otherwise compilation will fail, too).
   
   By this Policeman Jenkins is now doing a full tets of all combinations and makes sure the MR-JAR and the provider approach used works on all platforms.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] uschindler commented on pull request #912: MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23)

Posted by GitBox <gi...@apache.org>.
uschindler commented on PR #912:
URL: https://github.com/apache/lucene/pull/912#issuecomment-1140776739

   Thanks. Here we only need to compile a specific exact version (19). So for the MR-JAR task this should be fine with auto-discovery.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] uschindler commented on pull request #912: MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23)

Posted by GitBox <gi...@apache.org>.
uschindler commented on PR #912:
URL: https://github.com/apache/lucene/pull/912#issuecomment-1293279030

   Please do not wonder why "convEndian" is always on top of the call stack: The method is always called, although the endianness fits. It is just a NOOP on x86.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] mcimadamore commented on pull request #912: MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23)

Posted by GitBox <gi...@apache.org>.
mcimadamore commented on PR #912:
URL: https://github.com/apache/lucene/pull/912#issuecomment-1325173861

   > > More investigation is required to understand where the difference comes from.
   > 
   > After playing a bit more with the benchmark, it looks like 1ns is coming from the various alignment checks we need to perform - whereas ~0.5ns is coming from the segment liveness check. Of course, since this is a single call, the cost for these checks cannot be amortized. We can probably improve things a little (esp. when there's no alignment involved), but the numbers above reflect the fact that the memory segment implementation has more checks, so at the nanosecond scale, everything matters of course.
   
   With few optimization (to skip some checks when unaligned layouts are used, as well as to optimize other checks), I can get down to this (*):
   
   ```
   Benchmark                 (ELEM_SIZE)  Mode  Cnt   Score   Error  Units
   SegmentCopy.segment_copy            5  avgt   30   4.985 ± 0.030  ns/op
   SegmentCopy.segment_copy           10  avgt   30   4.853 ± 0.144  ns/op
   SegmentCopy.segment_copy           50  avgt   30   5.926 ± 0.066  ns/op
   SegmentCopy.segment_copy          100  avgt   30   8.272 ± 0.203  ns/op
   SegmentCopy.segment_copy          500  avgt   30  34.983 ± 0.226  ns/op
   SegmentCopy.segment_copy         1000  avgt   30  31.683 ± 0.161  ns/op
   ```
   
   (*) To reduce the cost of the liveness check I had to use a shared session, as the confinement check seems to be the main offender there.
   
   That said, before we go deeper in this kind of micro-optimizations, it would be better to understand if this is really the underlying cause behind the performance regression.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] uschindler commented on pull request #912: MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23)

Posted by GitBox <gi...@apache.org>.
uschindler commented on PR #912:
URL: https://github.com/apache/lucene/pull/912#issuecomment-1317635619

   I was hoping that JDK people do it in the same way for MemorySegment.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on pull request #912: MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23)

Posted by GitBox <gi...@apache.org>.
jpountz commented on PR #912:
URL: https://github.com/apache/lucene/pull/912#issuecomment-1313870436

   @uschindler FYI it looks like this change made indexing ~12% faster with `BEST_SPEED` on the stored fields benchmark: http://people.apache.org/~mikemccand/lucenebench/stored_fields_benchmarks.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] uschindler commented on pull request #912: MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23)

Posted by GitBox <gi...@apache.org>.
uschindler commented on PR #912:
URL: https://github.com/apache/lucene/pull/912#issuecomment-1314033529

   It does not fully explain it, because actually the copy of memory between file and heap is both using `Unsafe#copyMemory`. The only difference is that for ByteBuffer there is some hardcoded limit of 8 or 16 bytes (not exactly sure, have to look it up). If the byte[] not larger than it uses a copy-loop.
   
   In future it would be great if Panama would provide a method in `OutputStream` like `stream.writeBytes(MemorySegment segment, long offset, long count)`. This would allow to not copy at all and use a `dd` like approach. We could then add IndexInput#copyTo to our API. I think we worked on something similar, but for copying a complete off-heap approch would be best.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] uschindler commented on pull request #912: MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23)

Posted by GitBox <gi...@apache.org>.
uschindler commented on PR #912:
URL: https://github.com/apache/lucene/pull/912#issuecomment-1313949994

   > @uschindler FYI it looks like this change made indexing ~12% faster with `BEST_SPEED` on the stored fields benchmark: http://people.apache.org/~mikemccand/lucenebench/stored_fields_benchmarks.html
   
   That is interesting. I did not look at indexing, because we write there. I think this is because of maybe bulk merging was faster, am I correct? The new code is possibly cheaper with copying *large* parts from/to heap.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] uschindler commented on pull request #912: MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23)

Posted by GitBox <gi...@apache.org>.
uschindler commented on PR #912:
URL: https://github.com/apache/lucene/pull/912#issuecomment-1314122540

   Yes you're right!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] mikemccand commented on pull request #912: MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23)

Posted by GitBox <gi...@apache.org>.
mikemccand commented on PR #912:
URL: https://github.com/apache/lucene/pull/912#issuecomment-1293237593

   > If we do both at same time, we won't see a difference between old and new Lucene MMAP (on same version). A JDK upgrade may also change other performance numbers.
   
   Ack -- I turned on just the JDK 19 upgrade last night!
   
   `PKLookup` looks [a bit happy](https://home.apache.org/~mikemccand/lucenebench/PKLookup.html) about the upgrade, but otherwise I don't see other tasks showing much impact.
   
   I'll leave this for a few days, then turn on `--enable-preview`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] uschindler commented on pull request #912: MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23)

Posted by GitBox <gi...@apache.org>.
uschindler commented on PR #912:
URL: https://github.com/apache/lucene/pull/912#issuecomment-1253365297

   JDK 19 was released, I am working on the Toolchain support to support the compilation of the MR-JAR. At moment, the code commented out does not yet work, as AdoptOpenJDK / Temurin did not do a release yet: https://github.com/adoptium/adoptium/issues/170


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] uschindler commented on a diff in pull request #912: MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23)

Posted by GitBox <gi...@apache.org>.
uschindler commented on code in PR #912:
URL: https://github.com/apache/lucene/pull/912#discussion_r979985242


##########
gradle/generation/local-settings.gradle:
##########
@@ -98,12 +98,12 @@ org.gradle.workers.max=${maxWorkers}
 # Maximum number of test JVMs forked per test task.
 tests.jvms=${testsJvms}
 
-# Disable auto JVM provisioning (we don't use toolchains yet but want no surprises).
-org.gradle.java.installations.auto-download=false
+# Enable auto JVM provisioning.
+org.gradle.java.installations.auto-download=true
 
 # Set these to enable automatic JVM location discovery.
-org.gradle.java.installations.fromEnv=JDK11,JDK12,JDK13,JDK14,JDK15,JDK16,JDK17
-org.gradle.java.installations.paths=(custom paths)
+org.gradle.java.installations.fromEnv=JDK17,JDK19,JAVA19_HOME

Review Comment:
   Exactly that what you can do! When you set those environments, Gradle won't download anything.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] uschindler commented on pull request #912: MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23)

Posted by GitBox <gi...@apache.org>.
uschindler commented on PR #912:
URL: https://github.com/apache/lucene/pull/912#issuecomment-1293275751

   > > If we do both at same time, we won't see a difference between old and new Lucene MMAP (on same version). A JDK upgrade may also change other performance numbers.
   > 
   > Ack -- I turned on just the JDK 19 upgrade last night!
   > 
   > `PKLookup` looks [a bit happy](https://home.apache.org/~mikemccand/lucenebench/PKLookup.html) about the upgrade, but otherwise I don't see other tasks showing much impact.
   > 
   > I'll leave this for a few days, then turn on `--enable-preview`.
   
   OK, looks fine thanks!
   
   Once you switched `--enable-preview` on, you should no longer see stack traces like this in the profile:
   
   ```
   1.84%         209556        jdk.internal.misc.Unsafe#convEndian()
                                 at jdk.internal.misc.Unsafe#getIntUnaligned()
                                 at jdk.internal.misc.ScopedMemoryAccess#getIntUnalignedInternal()
                                 at jdk.internal.misc.ScopedMemoryAccess#getIntUnaligned()
                                 at java.nio.DirectByteBuffer#getInt()
                                 at java.nio.DirectByteBuffer#getInt()
                                 at org.apache.lucene.store.ByteBufferGuard#getInt()
                                 at org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl#readInt()
   ```
   
   They should be visible as calls from `MemorySegmentIndexInput` to `MemorySegment.get()`; `ByteBufferIndexInput` calling `DirectByteBuffer` and so on should diappear.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] mcimadamore commented on pull request #912: MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23)

Posted by GitBox <gi...@apache.org>.
mcimadamore commented on PR #912:
URL: https://github.com/apache/lucene/pull/912#issuecomment-1291775655

   @uschindler Just curious - did you get any feedback re. performance? Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] uschindler commented on pull request #912: MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23)

Posted by GitBox <gi...@apache.org>.
uschindler commented on PR #912:
URL: https://github.com/apache/lucene/pull/912#issuecomment-1257975327

   OpenJDK 19 was released by Eclipse Temurin: https://adoptium.net/de/temurin/releases?version=19
   
   Thanks @gdams and @adoptium.
   
   @msokolov: I will merge this to main, 9.x and 9.4 after adding changes.txt. When building the release you may need to remove your gradle.properties file in Lucene's directory. I will post a message.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on pull request #912: MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23)

Posted by GitBox <gi...@apache.org>.
jpountz commented on PR #912:
URL: https://github.com/apache/lucene/pull/912#issuecomment-1317520371

   @uschindler It looks like this benchmark has a noticeable slowdown with this change? https://home.apache.org/~mikemccand/lucenebench/MedTermDayTaxoFacets.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] uschindler commented on a diff in pull request #912: Initial rewrite of MMapDirectory for JDK-19 preview Panama APIs (>= JDK-19-ea+23)

Posted by GitBox <gi...@apache.org>.
uschindler commented on code in PR #912:
URL: https://github.com/apache/lucene/pull/912#discussion_r878662352


##########
lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java:
##########
@@ -232,55 +224,62 @@ public IndexInput openInput(String name, IOContext context) throws IOException {
     ensureOpen();
     ensureCanRead(name);
     Path path = directory.resolve(name);
-    try (FileChannel c = FileChannel.open(path, StandardOpenOption.READ)) {
-      final String resourceDescription = "MMapIndexInput(path=\"" + path.toString() + "\")";
-      final boolean useUnmap = getUseUnmap();
-      return ByteBufferIndexInput.newInstance(
-          resourceDescription,
-          map(resourceDescription, c, 0, c.size()),
-          c.size(),
-          chunkSizePower,
-          new ByteBufferGuard(resourceDescription, useUnmap ? CLEANER : null));
+    final String resourceDescription = "MemorySegmentIndexInput(path=\"" + path.toString() + "\")";
+
+    // Work around for JDK-8259028: we need to unwrap our test-only file system layers
+    path = Unwrappable.unwrapAll(path);
+
+    boolean success = false;
+    final MemorySession session = MemorySession.openShared();
+    try (var fc = FileChannel.open(path)) {
+      final long fileSize = fc.size();
+      final MemorySegment[] segments = map(session, resourceDescription, fc, fileSize);
+      final IndexInput in =
+          MemorySegmentIndexInput.newInstance(
+              resourceDescription, session, segments, fileSize, chunkSizePower);
+      success = true;
+      return in;
+    } finally {
+      if (success == false) {
+        session.close();
+      }
     }
   }
 
-  /** Maps a file into a set of buffers */
-  final ByteBuffer[] map(String resourceDescription, FileChannel fc, long offset, long length)
+  /** Maps a file into a set of segments */
+  final MemorySegment[] map(
+      MemorySession session, String resourceDescription, FileChannel fc, long length)
       throws IOException {
     if ((length >>> chunkSizePower) >= Integer.MAX_VALUE)
-      throw new IllegalArgumentException(
-          "RandomAccessFile too big for chunk size: " + resourceDescription);
+      throw new IllegalArgumentException("File too big for chunk size: " + resourceDescription);
 
     final long chunkSize = 1L << chunkSizePower;
 
-    // we always allocate one more buffer, the last one may be a 0 byte one
-    final int nrBuffers = (int) (length >>> chunkSizePower) + 1;
+    // we always allocate one more segments, the last one may be a 0 byte one
+    final int nrSegments = (int) (length >>> chunkSizePower) + 1;
 
-    ByteBuffer[] buffers = new ByteBuffer[nrBuffers];
+    final MemorySegment[] segments = new MemorySegment[nrSegments];
 
-    long bufferStart = 0L;
-    for (int bufNr = 0; bufNr < nrBuffers; bufNr++) {
-      int bufSize =
-          (int) ((length > (bufferStart + chunkSize)) ? chunkSize : (length - bufferStart));
-      MappedByteBuffer buffer;
+    long startOffset = 0L;
+    for (int segNr = 0; segNr < nrSegments; segNr++) {
+      long segSize = (length > (startOffset + chunkSize)) ? chunkSize : (length - startOffset);

Review Comment:
   I just rrwrote the old code. Yes could be final. We could also use `final var` which looks like bullshit :-)
   
   Previously it was `int` because it was about bytebufers, which are limited to 2 Gigabytes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] mocobeta commented on a diff in pull request #912: Initial rewrite of MMapDirectory for JDK-19 preview Panama APIs (>= JDK-19-ea+23)

Posted by GitBox <gi...@apache.org>.
mocobeta commented on code in PR #912:
URL: https://github.com/apache/lucene/pull/912#discussion_r878666674


##########
lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java:
##########
@@ -232,55 +224,62 @@ public IndexInput openInput(String name, IOContext context) throws IOException {
     ensureOpen();
     ensureCanRead(name);
     Path path = directory.resolve(name);
-    try (FileChannel c = FileChannel.open(path, StandardOpenOption.READ)) {
-      final String resourceDescription = "MMapIndexInput(path=\"" + path.toString() + "\")";
-      final boolean useUnmap = getUseUnmap();
-      return ByteBufferIndexInput.newInstance(
-          resourceDescription,
-          map(resourceDescription, c, 0, c.size()),
-          c.size(),
-          chunkSizePower,
-          new ByteBufferGuard(resourceDescription, useUnmap ? CLEANER : null));
+    final String resourceDescription = "MemorySegmentIndexInput(path=\"" + path.toString() + "\")";
+
+    // Work around for JDK-8259028: we need to unwrap our test-only file system layers
+    path = Unwrappable.unwrapAll(path);
+
+    boolean success = false;
+    final MemorySession session = MemorySession.openShared();
+    try (var fc = FileChannel.open(path)) {
+      final long fileSize = fc.size();
+      final MemorySegment[] segments = map(session, resourceDescription, fc, fileSize);
+      final IndexInput in =
+          MemorySegmentIndexInput.newInstance(
+              resourceDescription, session, segments, fileSize, chunkSizePower);
+      success = true;
+      return in;
+    } finally {
+      if (success == false) {
+        session.close();
+      }
     }
   }
 
-  /** Maps a file into a set of buffers */
-  final ByteBuffer[] map(String resourceDescription, FileChannel fc, long offset, long length)
+  /** Maps a file into a set of segments */
+  final MemorySegment[] map(
+      MemorySession session, String resourceDescription, FileChannel fc, long length)
       throws IOException {
     if ((length >>> chunkSizePower) >= Integer.MAX_VALUE)
-      throw new IllegalArgumentException(
-          "RandomAccessFile too big for chunk size: " + resourceDescription);
+      throw new IllegalArgumentException("File too big for chunk size: " + resourceDescription);
 
     final long chunkSize = 1L << chunkSizePower;
 
-    // we always allocate one more buffer, the last one may be a 0 byte one
-    final int nrBuffers = (int) (length >>> chunkSizePower) + 1;
+    // we always allocate one more segments, the last one may be a 0 byte one
+    final int nrSegments = (int) (length >>> chunkSizePower) + 1;
 
-    ByteBuffer[] buffers = new ByteBuffer[nrBuffers];
+    final MemorySegment[] segments = new MemorySegment[nrSegments];
 
-    long bufferStart = 0L;
-    for (int bufNr = 0; bufNr < nrBuffers; bufNr++) {
-      int bufSize =
-          (int) ((length > (bufferStart + chunkSize)) ? chunkSize : (length - bufferStart));
-      MappedByteBuffer buffer;
+    long startOffset = 0L;
+    for (int segNr = 0; segNr < nrSegments; segNr++) {
+      long segSize = (length > (startOffset + chunkSize)) ? chunkSize : (length - startOffset);

Review Comment:
   I'm resolving this - it's a local variable in the loop with a smaller scope, I just noticed that it looks that other parts of this patch try to use final as far as possible.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] uschindler commented on pull request #912: Initial rewrite of MMapDirectory for JDK-19 preview Panama APIs (>= JDK-19-ea+23)

Posted by GitBox <gi...@apache.org>.
uschindler commented on PR #912:
URL: https://github.com/apache/lucene/pull/912#issuecomment-1140345113

   Hi @dweiss: Could you have a look at what I am planning, especially the MR-JAR and how I intend to use the Gradle toolchain API to compile against Java 19 in the future?
   
   If you know a solution how to use EA releases with Toolchain API, tell me. I gave up on this. The URLS for AdoptOpenJDK are hardcoded in Gradle, see: https://github.com/gradle/gradle/issues/14515 and https://github.com/gradle/gradle/issues/14814


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org