You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/01/02 14:42:45 UTC

[GitHub] [lucene-solr] uschindler opened a new pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

uschindler opened a new pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176

This is just a draft PR for a first insight on memory maaping improvements in JDK 16+.

Some background information: Starting with JDK-14, there is a new incubating module "jdk.incubator.foreign" that has a new, not yet stable API for accessing off-heap memory (and later it will also support calling functions using classical MethodHandles that are located in libraries like .so or .dll files). This incubator module has several versions:
- first version: https://openjdk.java.net/jeps/370 (slow, very buggy and thread confinement, so making it unuseable with Lucene)
- second version: https://openjdk.java.net/jeps/383 (still thread confinement, but now allows transfer of "ownership" to other threads; this is still impossible to use with Lucene.
- third version in JDK 16: https://openjdk.java.net/jeps/393 (this version has included "Support for shared segments"). This now allows us to safely use the same external mmaped memory from different threads and also unmap it!

This module more or less overcomes several problems:
- ByteBuffer API is limited to 32bit (in fact MMapDirectory has to chunk in 1 GiB portions)
- There is no official way to unmap ByteBuffers as the file is not used. There is a way to use `sun.misc.Unsafe` and forcefully unmap segments, but any IndexInput accessing the file from another thread will crush the JVM with SIGSEGV or SIGBUS. We learned to live with that, but that's the main issue.

@uschindler had many discussions with the team at OpenJDK and finally with the third incubator, we have an API that works with Lucene. It was very fruitful discussions (thanks to @mcimadamore !)

With the third incubator we are now finally able to do some tests (especially performance). As this is an incubating module, this PR first changes a bit the build system:
- disable `-Werror` for `:lucene:core`
- add the incubating module to compiler of `:lucene:core` and enable it for all test builds. This is important, as you have to mass `--add-modules jdk.incubator.foreign` also at runtime!

The code basically just modifies `MMapDirectory` to use LONG instead of INT for the chunk size parameter. In addition it adds `MemorySegmentIndexInput` that is a copy of our `ByteBufferIndexInput` (still there, but unused), but using MemorySegment instead of ByteBuffer behind the scenes. It works in exactly the same way, just the try/catch blocks for supporting EOFException or moving to another segment were rewritten.

The openInput code uses `MemorySegment.mapFile()` to get a memory mapping. This method is unfortunately a bit buggy in JDK-16-ea-b30, so I added some workarounds. See JDK issues: https://bugs.openjdk.java.net/browse/JDK-8259027, https://bugs.openjdk.java.net/browse/JDK-8259028, https://bugs.openjdk.java.net/browse/JDK-8259032, https://bugs.openjdk.java.net/browse/JDK-8259034

It passes all tests and it looks like you can use it to read indexes. The default chunk size is now 16 GiB (but you can raise or lower it as you like; tests are doing this). Of course you can set it to Long.MAX_VALUE, in that case every index file is always mapped to one big memory mapping. My testing with Windows 10 have shown, that this is *not a good idea!!!*. Huge mappings fragment address space over time and as we can only use like 43 or 46 bits (depending on OS), the fragmentation will at some point kill you. So 16 GiB looks like a good compromise: Most files will be smaller than 6 GiB anyways (unless you optimize your index to one huge segment). So for most Lucene installations, the number of segments will equal the number of open files, so Elasticsearch huge user consumers will be very happy. The sysctl max_map_count may not need to be touched anymore.

In addition, this implements `readLELongs` in a better way than @jpountz did (no caching or arbitrary objects). Nevertheless, as the new MemorySegment API relies on final, unmodifiable classes and coping memory from a MemorySegment to a on-heap Java array, it requires us to wrap all those arrays using a MemorySegment each time (e.g. in `readBytes()` or `readLELongs`), there may be some overhead du to short living object allocations (those are NOT reuseable!!!). _In short: In future we should throw away on coping/loading our stuff to heap and maybe throw away IndexInput completely and base our code fully on random access. The new foreign-vector APIs will in future also be written with MemorySegment in its focus. So you can allocate a vector view on a MemorySegment and let the vectorizer fully work outside java heap inside our mmapped files! :-)_

It would be good if you could checkout this branch and try it in production.

But be aware:
- You need JDK 11 to run Gradle (set `JAVA_HOME` to it)
- You need JDK 16-ea-b30 (set `RUNTIME_JAVA_HOME` to it)
- The lucene-core.jar will be JDK16 class files and requires JDK-16 to execute.
- Also you need to add `--add-modules jdk.incubator.foreign` to the command line of your Java program/Solr server/Elasticsearch server

It would be good to get some benchmarks, especially by @rmuir or @mikemccand. Take your time and enjoy the complexity.

My plan is the following:
- report any bugs or slowness, especially with Hotspot optimizaions. The last time I talked to Maurizio, he taked about Hotspot not being able to fully optimize for-loops with long instead of int, so it may take some time until the full performance is there.
- wait until the final version of project PANAMA-foreign goes into Java's Core Library (no module needed anymore)
- add a MR-JAR for lucene-core.jar and compile the MemorySegmentIndexInput and maybe some helper classes with JDK 17/18/19 (hopefully?).

In addition there are some comments in the code tlaking about safety (e.g., we need `IOUtils.close()` taking `AutoCloseable` instead of just `Closeable`, so we can also enfoce that all memory segments are closed after usage. In addition, by default all VarHandles are aligned. By default it refuses to read a LONG from an address which is not a multiple of 8. I had to disable this feature, as all our index files are heavily unaliged. We should in meantime not only convert our files to little endian, but also make all non-compressed types (like `long[]` arrays or non-encoded integers be aligned to the correct boundaries in files). The most horrible thing I have seen is that our CFS file format starts the "inner" files totally unaligned. We should fix the CFSWriter to start new files always at multiples of 8 bytes. I will open an issue about this.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] uschindler commented on pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

uschindler commented on pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#issuecomment-753673096


   Hi read the instructions above, at "be aware". Gradle ist Not compatible to JDK 16 at all. So Gradle needs to run with JDK 11. You can pass the JDK 16 directory using an environment variable or through sysprop. Gradlew shows all options when you run it without any options and read instructions about alternative jvms.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] uschindler commented on a change in pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

uschindler commented on a change in pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#discussion_r551191931



##########
File path: gradle/defaults-java.gradle
##########
@@ -60,3 +60,14 @@ allprojects {
     }
   }
 }
+
+configure(project(":lucene:core")) {

Review comment:
       We can't release that anyways. Earliest ist jdk 17, if not 18. The API is not stable at all.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] dweiss commented on a change in pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

dweiss commented on a change in pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#discussion_r551179881



##########
File path: gradle/defaults-java.gradle
##########
@@ -60,3 +60,14 @@ allprojects {
     }
   }
 }
+
+configure(project(":lucene:core")) {

Review comment:
       So if we want this in before JDK16 becomes the minimum platform we'd need to use a multi-release jar again, right?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] uschindler commented on pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) Panama APIs (>= JDK-16-ea-b32)

Posted by GitBox <gi...@apache.org>.

uschindler commented on pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#issuecomment-855874294


   I will close this and would like to move discussion over to https://github.com/apache/lucene/pull/173


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] uschindler commented on pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

uschindler commented on pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#issuecomment-753691565


   Hi @msokolov: I removed some incorrect assert, which may make test fail on linux/mac.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] uschindler edited a comment on pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

uschindler edited a comment on pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#issuecomment-754304561


   This would also explain slowdowns on *short* posting-list iterations, as readLELongs() with smaller number of longs is not working well.
   
   My plan would be to do an if statement at beginning of readBytes() or readLELongs() like:
   ```java
   if (length < SOME_LIMIT) {
     for (int i = offset; i< offset + length; i++) {
       bytes[i] = readByte();
     }
   } else {
     ... current code with targetSlice ...
   }
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] uschindler commented on pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

uschindler commented on pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#issuecomment-753547873


   I fixed the rmeinaing TODOs regarding a safe close of *all* segments, when exceptions on `map()` occur. When closing the master IndexInput, we also make sure to unmap all segments, although exceptions might occur (e.g. on concurrent access, `close()` may fail with IllegalStateException). Those exceptions are bubbled up.
   
   As MemorySegment does not implement `Closeable` but the more generic `AutoCloseable`, I used  `IOUtils.applyToAll()` with `MemorySegment::close` as method reference to the close method (heavy functional interface adaption, ey?)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] uschindler commented on a change in pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

uschindler commented on a change in pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#discussion_r551061081



##########
File path: lucene/core/src/java/org/apache/lucene/store/MemorySegmentIndexInput.java
##########
@@ -0,0 +1,525 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.store;
+
+import java.io.EOFException;
+import java.io.IOException;
+import java.lang.invoke.VarHandle;
+import java.nio.ByteOrder;
+import java.util.Arrays;
+import java.util.Objects;
+
+import org.apache.lucene.util.IOUtils;
+
+import jdk.incubator.foreign.MemoryHandles;
+import jdk.incubator.foreign.MemorySegment;
+
+/**
+ * Base IndexInput implementation that uses an array of MemorySegments to represent a file.
+ *
+ * <p>For efficiency, this class requires that the segment size are a power-of-two
+ * (<code>chunkSizePower</code>).
+ */
+public abstract class MemorySegmentIndexInput extends IndexInput implements RandomAccessInput {
+  // We pass 1L as alignment, because currently Lucene file formats are heavy unaligned: :(
+  static final VarHandle VH_getByte = MemoryHandles.varHandle(byte.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();
+  static final VarHandle VH_getShort = MemoryHandles.varHandle(short.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();
+  static final VarHandle VH_getInt = MemoryHandles.varHandle(int.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();
+  static final VarHandle VH_getLong = MemoryHandles.varHandle(long.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();
+  
+  final boolean isClone;
+  final long length;
+  final long chunkSizeMask;
+  final int chunkSizePower;
+  final MemorySegment[] segments;
+  
+  int curSegmentIndex = -1;
+  MemorySegment curSegment; // redundant for speed: segments[curSegmentIndex], also marker if closed!
+  long curPosition; // relative to curSegment, not globally
+
+  public static MemorySegmentIndexInput newInstance(
+      String resourceDescription,
+      MemorySegment[] segments,
+      long length,
+      int chunkSizePower) {
+    if (segments.length == 1) {
+      return new SingleSegmentImpl(resourceDescription, segments[0], length, chunkSizePower, false);
+    } else {
+      return new MultiSegmentImpl(resourceDescription, segments, 0, length, chunkSizePower, false);
+    }
+  }
+
+  private MemorySegmentIndexInput(
+      String resourceDescription,
+      MemorySegment[] segments,
+      long length,
+      int chunkSizePower,
+      boolean isClone) {
+    super(resourceDescription);
+    this.segments = segments;
+    this.length = length;
+    this.chunkSizePower = chunkSizePower;
+    this.chunkSizeMask = (1L << chunkSizePower) - 1L;
+    this.isClone = isClone;
+    this.curSegment = segments[0];
+  }
+  
+  void ensureOpen() {
+    if (curSegment == null) {
+      throw alreadyClosed();
+    }
+  }
+
+  RuntimeException wrapAlreadyClosedException(RuntimeException e) {
+    if (e instanceof NullPointerException) {
+      return alreadyClosed();
+    }
+    // TODO: maybe open a JDK issue to have a separate, more
+    // meaningful exception for unmapped segments:
+    if (e.getMessage() != null && e.getMessage().contains("closed")) {
+      return alreadyClosed();
+    }
+    return e;
+  }
+  
+  // return value only for compiler!

Review comment:
       no, we need some return value so we can call it and return a value. But as it is throwing an exception anyways, the return value does not matter.
   
   These tricks are often used for exception handlers to delegate "exception handling" to a method where you know it will never return and just bubble up the exception:
   - if the exception is known before (by type), you use `throw callSomeMethod(param)`. The pro thing: it's clean as you delegate the instantiation of the exception to the factory, but throw it on your own. The problem here is: only works with a known exception type - and a single one, because the method needs it as return type. A generic `Exception` is a desaster, as it requires to declare that on throws! 
   - an alternative is (and that's used here): put the throw clause also into the method, so the only sense of our method is throwing some exception, it never returns cleanly. Backside: the method needs a return type, so you can do: `return callSomeMethod()` from the caller. Of course you can add a separate return statement after the method call, but that looks more bullshitty.
   
   Here we use the second approch. The most common return type to all callers of this method is `byte`. Compiler is happy, voila!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] uschindler commented on pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

uschindler commented on pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#issuecomment-753512086


   FYI, after I was able to run tests in a more convenient way, I figured out that some tests fail from time to time because of using `HandleTrackingFS`. I disabled all custom filesystems in the PR, but it looks like some tests use `HandleTrackingFS`.
   
   I forgot to mention this in my description above: MMapDirectory no longer works with custom java.nio.filesystem.FileSystem, as it tries to cast the FileChannel internally (see JDK issue: https://bugs.openjdk.java.net/browse/JDK-8259028). For now we have to disable all custom file systems in our test suite until this bug is fixed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] msokolov edited a comment on pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

msokolov edited a comment on pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#issuecomment-754153605


   I ran some vector search performance tests even though this isn't probably the most representative test, but since I have been running these for other reasons lately and I'm all geared up for it, I thought I would fire them up with these changes. I see a pronounced slowdown there, about 40% increase in latency for these KNN searches with this patch (I ran both conditions w/JDK16). Most of the workload from these searches are random-access reads of bytes from the index, conversion to float[], and computing dot products.
   
   I did recently discover that we were (inadvertently) reading and writing the `float[]` vectors that these searches are based on as big-endian, so perhaps if (when) we switch that (I have a pending patch), it would work better in conjunction with this??, although currently on master, at the `IndexInput` level it is just reading bytes on the heap and only afterwards converting those to `float[]`, so it doesn't seem likely there would be any interaction there; I just mention it since it's top of mind, and since I plan to add `DataInput.readFloats()`, which would be needed here too.
   
   Then I also ran luceneutil tests and see interesting mixed results:
   
   ```
                       TaskQPS baseline      StdDevQPS candidate      StdDev                Pct diff p-value
      BrowseMonthTaxoFacets        2.15      (7.7%)        1.19      (2.1%)  -44.5% ( -50% -  -37%) 0.000
   BrowseDayOfYearTaxoFacets        2.03      (8.1%)        1.16      (2.4%)  -43.0% ( -49% -  -35%) 0.000
       BrowseDateTaxoFacets        2.03      (8.2%)        1.16      (2.3%)  -42.9% ( -49% -  -35%) 0.000
                    Respell       52.33      (2.4%)       46.89      (1.7%)  -10.4% ( -14% -   -6%) 0.000
      BrowseMonthSSDVFacets       12.00     (13.2%)       10.81      (6.9%)  -10.0% ( -26% -   11%) 0.003
                   PKLookup      132.89      (2.3%)      120.09      (1.3%)   -9.6% ( -13% -   -6%) 0.000
                AndHighHigh       24.98      (3.0%)       22.61      (2.1%)   -9.5% ( -14% -   -4%) 0.000
            MedSloppyPhrase       15.66      (3.9%)       14.37      (2.0%)   -8.2% ( -13% -   -2%) 0.000
                LowSpanNear       96.34      (2.3%)       88.72      (1.6%)   -7.9% ( -11% -   -4%) 0.000
           HighSloppyPhrase       10.57      (4.6%)        9.75      (2.7%)   -7.7% ( -14% -    0%) 0.000
                    Prefix3       37.23      (7.2%)       34.53      (6.1%)   -7.2% ( -19% -    6%) 0.001
                MedSpanNear       15.63      (1.5%)       14.52      (2.0%)   -7.1% ( -10% -   -3%) 0.000
            LowSloppyPhrase       15.11      (4.4%)       14.06      (2.5%)   -6.9% ( -13% -    0%) 0.000
                 AndHighMed       72.87      (3.4%)       67.88      (3.0%)   -6.8% ( -12% -    0%) 0.000
                 AndHighLow      366.41      (3.2%)      342.99      (3.6%)   -6.4% ( -12% -    0%) 0.000
                  OrHighMed       69.00      (2.6%)       65.09      (2.7%)   -5.7% ( -10% -    0%) 0.000
                    LowTerm      930.90      (5.7%)      878.74      (4.3%)   -5.6% ( -14% -    4%) 0.000
                     Fuzzy2       46.43      (7.2%)       43.94      (5.4%)   -5.4% ( -16% -    7%) 0.008
                     Fuzzy1       64.31      (6.9%)       60.98      (6.5%)   -5.2% ( -17% -    8%) 0.014
               HighSpanNear       19.07      (1.7%)       18.12      (2.6%)   -5.0% (  -9% -    0%) 0.000
                  MedPhrase       21.45      (2.4%)       20.53      (1.9%)   -4.3% (  -8% -    0%) 0.000
       HighTermTitleBDVSort       73.32     (16.1%)       70.33     (17.0%)   -4.1% ( -32% -   34%) 0.437
                 OrHighHigh       16.82      (1.7%)       16.19      (1.9%)   -3.7% (  -7% -    0%) 0.000
   BrowseDayOfYearSSDVFacets        9.56      (8.7%)        9.24      (6.2%)   -3.4% ( -16% -   12%) 0.159
       HighIntervalsOrdered        9.02      (0.6%)        8.78      (0.7%)   -2.7% (  -4% -   -1%) 0.000
               OrNotHighMed      324.93      (2.7%)      316.20      (3.6%)   -2.7% (  -8% -    3%) 0.007
                   Wildcard      129.01      (2.5%)      126.91      (2.6%)   -1.6% (  -6% -    3%) 0.046
      HighTermDayOfYearSort      113.01     (11.9%)      111.68     (10.6%)   -1.2% ( -21% -   24%) 0.741
               OrNotHighLow      539.19      (2.8%)      534.10      (2.8%)   -0.9% (  -6% -    4%) 0.284
                  OrHighLow      314.01      (5.4%)      311.23      (5.9%)   -0.9% ( -11% -   11%) 0.620
          HighTermMonthSort       96.00      (9.6%)       95.48     (10.1%)   -0.5% ( -18% -   21%) 0.863
                 TermDTSort      100.42     (10.9%)      100.52     (13.4%)    0.1% ( -21% -   27%) 0.980
                  LowPhrase      259.52      (1.9%)      262.37      (2.2%)    1.1% (  -2% -    5%) 0.091
                   HighTerm      872.57      (3.9%)      892.07      (3.4%)    2.2% (  -4% -    9%) 0.052
                    MedTerm     1003.36      (4.8%)     1029.34      (4.8%)    2.6% (  -6% -   12%) 0.086
                 HighPhrase      220.30      (1.9%)      228.63      (3.3%)    3.8% (  -1% -    9%) 0.000
              OrHighNotHigh      372.34      (3.4%)      391.32      (4.2%)    5.1% (  -2% -   13%) 0.000
              OrNotHighHigh      389.61      (2.7%)      415.68      (5.5%)    6.7% (  -1% -   15%) 0.000
               OrHighNotMed      409.35      (1.9%)      445.76      (6.9%)    8.9% (   0% -   18%) 0.000
               OrHighNotLow      513.64      (4.2%)      561.96      (7.9%)    9.4% (  -2% -   22%) 0.000
                     IntNRQ      203.68      (2.3%)      224.37      (4.7%)   10.2% (   3% -   17%) 0.000
   ```
   
   note on the setup: I had to tinker with luceneutil a bit to get it to use JDK16 runtime and add the foreign memory access module. gradle seemed to work OK given the environment variable setup discussed above. I did check the command line to make sure it was in fact using JDK16, and I saw it fail once (before I added the module import command line argument), so I'm sure it is.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] uschindler commented on pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

uschindler commented on pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#issuecomment-753503210


   @dweiss I need your help here. There is one thing that drives me crazy: with the changes (actually adding `--add-modules", "jdk.incubator.foreign` to the test command line, JUnit 4 suddely fails for many "non-tests" with exceptions like this.
   
   This seems to have to do with some automatism by Gradle or by JUnit once it detects the "Java module system", and I have no idea how to turn it off. What I am completely wondering about: why does it load those classes at all? I was trying to find the good old "Ant fileset" that has this `**/*Test.class,**/Test*.class` pattern, but this is no longer there with gradle. How to readd it, if JUnit/Gradle seems to think that it needs some (broken) module system API.
   
   Unless this is fixed, I can't run all tests easily. I only ran them per module and ignored the tons of stack traces. PLEASE HELP!
   
   ```
   org.apache.lucene.backward_codecs.lucene60.Lucene60PointsWriter > initializationError FAILED
       java.lang.IllegalArgumentException: Test class can only have one constructor
           at org.junit.runners.model.TestClass.<init>(TestClass.java:48)
           at org.junit.runners.JUnit4.<init>(JUnit4.java:23)
           at org.junit.internal.builders.JUnit4Builder.runnerForClass(JUnit4Builder.java:10)
           at org.junit.runners.model.RunnerBuilder.safeRunnerForClass(RunnerBuilder.java:70)
           at org.junit.internal.builders.AllDefaultPossibilitiesBuilder.runnerForClass(AllDefaultPossibilitiesBuilder.java:37)
           at org.junit.runners.model.RunnerBuilder.safeRunnerForClass(RunnerBuilder.java:70)
           at org.junit.internal.requests.ClassRequest.createRunner(ClassRequest.java:28)
           at org.junit.internal.requests.MemoizingRequest.getRunner(MemoizingRequest.java:19)
           at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:78)
           at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
           at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
           at org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
           at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
           at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:78)
           at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.base/java.lang.reflect.Method.invoke(Method.java:567)
           at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36)
           at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
           at org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:33)
           at org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:94)
           at jdk.proxy1/jdk.proxy1.$Proxy2.processTestClass(Unknown Source)
           at org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:119)
           at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:78)
           at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.base/java.lang.reflect.Method.invoke(Method.java:567)
           at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36)
           at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
           at org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:182)
           at org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:164)
           at org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:414)
           at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:64)
           at org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:48)
           at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
           at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
           at org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:56)
           at java.base/java.lang.Thread.run(Thread.java:831)
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] dweiss commented on a change in pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

dweiss commented on a change in pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#discussion_r551193215



##########
File path: gradle/defaults-java.gradle
##########
@@ -60,3 +60,14 @@ allprojects {
     }
   }
 }
+
+configure(project(":lucene:core")) {

Review comment:
       Ok, just wondering!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] msokolov commented on a change in pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

msokolov commented on a change in pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#discussion_r551104528



##########
File path: lucene/core/src/java/org/apache/lucene/store/MemorySegmentIndexInput.java
##########
@@ -0,0 +1,525 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.store;
+
+import java.io.EOFException;
+import java.io.IOException;
+import java.lang.invoke.VarHandle;
+import java.nio.ByteOrder;
+import java.util.Arrays;
+import java.util.Objects;
+
+import org.apache.lucene.util.IOUtils;
+
+import jdk.incubator.foreign.MemoryHandles;
+import jdk.incubator.foreign.MemorySegment;
+
+/**
+ * Base IndexInput implementation that uses an array of MemorySegments to represent a file.
+ *
+ * <p>For efficiency, this class requires that the segment size are a power-of-two
+ * (<code>chunkSizePower</code>).
+ */
+public abstract class MemorySegmentIndexInput extends IndexInput implements RandomAccessInput {
+  // We pass 1L as alignment, because currently Lucene file formats are heavy unaligned: :(
+  static final VarHandle VH_getByte = MemoryHandles.varHandle(byte.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();
+  static final VarHandle VH_getShort = MemoryHandles.varHandle(short.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();
+  static final VarHandle VH_getInt = MemoryHandles.varHandle(int.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();
+  static final VarHandle VH_getLong = MemoryHandles.varHandle(long.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();
+  
+  final boolean isClone;
+  final long length;
+  final long chunkSizeMask;
+  final int chunkSizePower;
+  final MemorySegment[] segments;
+  
+  int curSegmentIndex = -1;
+  MemorySegment curSegment; // redundant for speed: segments[curSegmentIndex], also marker if closed!
+  long curPosition; // relative to curSegment, not globally
+
+  public static MemorySegmentIndexInput newInstance(
+      String resourceDescription,
+      MemorySegment[] segments,
+      long length,
+      int chunkSizePower) {
+    if (segments.length == 1) {
+      return new SingleSegmentImpl(resourceDescription, segments[0], length, chunkSizePower, false);
+    } else {
+      return new MultiSegmentImpl(resourceDescription, segments, 0, length, chunkSizePower, false);
+    }
+  }
+
+  private MemorySegmentIndexInput(
+      String resourceDescription,
+      MemorySegment[] segments,
+      long length,
+      int chunkSizePower,
+      boolean isClone) {
+    super(resourceDescription);
+    this.segments = segments;
+    this.length = length;
+    this.chunkSizePower = chunkSizePower;
+    this.chunkSizeMask = (1L << chunkSizePower) - 1L;
+    this.isClone = isClone;
+    this.curSegment = segments[0];
+  }
+  
+  void ensureOpen() {
+    if (curSegment == null) {
+      throw alreadyClosed();
+    }
+  }
+
+  RuntimeException wrapAlreadyClosedException(RuntimeException e) {
+    if (e instanceof NullPointerException) {
+      return alreadyClosed();
+    }
+    // TODO: maybe open a JDK issue to have a separate, more
+    // meaningful exception for unmapped segments:
+    if (e.getMessage() != null && e.getMessage().contains("closed")) {
+      return alreadyClosed();
+    }
+    return e;
+  }
+  
+  // return value only for compiler!

Review comment:
       I see, it saves the (bullshitty) pretense of returning some dummy value when that case will never actually occur - neat




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] uschindler edited a comment on pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

uschindler edited a comment on pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#issuecomment-754299385

Hi @msokolov,
thanks for the speed test - I wanted to wait for the specialists to do it. As said before, we should report slowness (or improvements) to the Hotpsot/OpenJDK people. I was expecting a result like this due to the early stage of the development and misisng Hotspot optimizations. Just to clarify: In your code, you are reading the floats with a series of readInt() from the indexinput, converting them to float?
If that's the case, I would say: this is expected! From what I figured out on the mailing lists: loops using `long` as an index vs. `int` as an index are not supported well by Hotspot yet. The position() argument of ByteBuffers is a 32bits `int`and whenever you read from memory it uses this int (`index += 4`). Those loops are easy optimized. With `MemorySegment`s, the pointer is a 64bit `long` named `curPosition` in our `MemorySegmentIndexInput`. Hotspot has no idea how to optimize that loop of readInt() statements! I am not sure if this is fully true at this stage but that's my first guess, I will discuss this with Maurizio.

Interestingly, I would have expected that `readBytes()` and `readLELongs()` are slower due to the overheads by wrapping all those `slice()` calls and `MemorySegment.ofArray()`all producing short living object instances! But I seem to be proved wrong :-)

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] uschindler commented on pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

uschindler commented on pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#issuecomment-753492065


   > Very exciting! Thank you for leading the way here, Uwe.
   > 
   >> The most horrible thing I have seen is that our CFS file format starts the "inner" files totally unaligned. We should fix the CFSWriter to start new files always at multiples of 8 bytes. I will open an issue about this.
   >
   > I ran into this just yesterday as I was playing around with aligning vectors in their index files to see if any perf bump could be gained (the header seems to be 50 bytes usually, so some padding is needed). And then when I asserted alignment, realized that CFS was messing with it.
   
   In fact for CFS file we don't even need to change the file format / version number. We just ave to make sure that CFS writer starts new inner files aligned to 8 bytes. That should be easy to implement.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] uschindler commented on pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

uschindler commented on pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#issuecomment-755778832


   Hi @msokolov: I added the readLEFloats() from #2175 here.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] msokolov commented on pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

msokolov commented on pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#issuecomment-753672740


   Ah, OK once I set `RUNTIME_JAVA_HOME` I was able to compile OK


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] uschindler commented on pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

uschindler commented on pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#issuecomment-754302399

> No, the floats are read using readBytes() into a ByteBuffer and then converted using a derived FloatBuffer. So I think your intuition about readBytes() is correct?

Huh? Really. I thought you are using the code in readLELongs default implementation on IndexInput. So you also copy stuff 2 times! That's even worse, it would have been better to the ByteBuffer directly from IndexInput: `curSegment.slice(offset, length).asByteBuffer()` would return a view on the bytes. Or alternatively do it like with readLELongs().

But nevertheless, for readBytes the wrapping is extensive and may be a slowdown. Can you try this again using a loop of `Float.intBitsToFloat(input.readInt())`? If this is faster I would say that my first expectation is correct: If the number of integers or single bytes to read is small (maybe <50 or <100) then it might really be better to just have a loop calling readByte() or readLong() instead of the array views.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] uschindler commented on pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

uschindler commented on pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#issuecomment-761142012


   I removed the hacks for the bugs in JDK. The minimum requirement to test this draft implementation is *JDK-16-ea-b32*.
   
   Thanks @msokolov for the performance tests. I was hoping that it gets a bit better, but we may need to figure out where the speed differences are coming from. I don't think results will change in the new preview build, I just removed the hacks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] uschindler commented on pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

uschindler commented on pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#issuecomment-753511728


   @dweiss I found a workaround by adding the include/excludes as we use in Ant build. Why were those removed? Should I maybe reopen a new issue.
   
   I have the feeling thois comes from the fact that class files compiled with JDK16 and using the new class file format can't be "analyzed" by gradle, so it assumes "no idea, let's run it because its a class file". The automatisms don't seem to work then. With the include/exclude patterns everything went back to normal.
   
   Should we also add those lines to `defaults-tests.gradle`?:
   
   ```
         include '**/*Test.class', '**/Test*.class'
         exclude '**/*$*'
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] uschindler commented on a change in pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

uschindler commented on a change in pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#discussion_r551061719



##########
File path: lucene/core/src/java/org/apache/lucene/store/MemorySegmentIndexInput.java
##########
@@ -0,0 +1,525 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.store;
+
+import java.io.EOFException;
+import java.io.IOException;
+import java.lang.invoke.VarHandle;
+import java.nio.ByteOrder;
+import java.util.Arrays;
+import java.util.Objects;
+
+import org.apache.lucene.util.IOUtils;
+
+import jdk.incubator.foreign.MemoryHandles;
+import jdk.incubator.foreign.MemorySegment;
+
+/**
+ * Base IndexInput implementation that uses an array of MemorySegments to represent a file.
+ *
+ * <p>For efficiency, this class requires that the segment size are a power-of-two
+ * (<code>chunkSizePower</code>).
+ */
+public abstract class MemorySegmentIndexInput extends IndexInput implements RandomAccessInput {
+  // We pass 1L as alignment, because currently Lucene file formats are heavy unaligned: :(
+  static final VarHandle VH_getByte = MemoryHandles.varHandle(byte.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();
+  static final VarHandle VH_getShort = MemoryHandles.varHandle(short.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();
+  static final VarHandle VH_getInt = MemoryHandles.varHandle(int.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();
+  static final VarHandle VH_getLong = MemoryHandles.varHandle(long.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();
+  
+  final boolean isClone;
+  final long length;
+  final long chunkSizeMask;
+  final int chunkSizePower;
+  final MemorySegment[] segments;
+  
+  int curSegmentIndex = -1;
+  MemorySegment curSegment; // redundant for speed: segments[curSegmentIndex], also marker if closed!
+  long curPosition; // relative to curSegment, not globally
+
+  public static MemorySegmentIndexInput newInstance(
+      String resourceDescription,
+      MemorySegment[] segments,
+      long length,
+      int chunkSizePower) {
+    if (segments.length == 1) {
+      return new SingleSegmentImpl(resourceDescription, segments[0], length, chunkSizePower, false);
+    } else {
+      return new MultiSegmentImpl(resourceDescription, segments, 0, length, chunkSizePower, false);
+    }
+  }
+
+  private MemorySegmentIndexInput(
+      String resourceDescription,
+      MemorySegment[] segments,
+      long length,
+      int chunkSizePower,
+      boolean isClone) {
+    super(resourceDescription);
+    this.segments = segments;
+    this.length = length;
+    this.chunkSizePower = chunkSizePower;
+    this.chunkSizeMask = (1L << chunkSizePower) - 1L;
+    this.isClone = isClone;
+    this.curSegment = segments[0];
+  }
+  
+  void ensureOpen() {
+    if (curSegment == null) {
+      throw alreadyClosed();
+    }
+  }
+
+  RuntimeException wrapAlreadyClosedException(RuntimeException e) {
+    if (e instanceof NullPointerException) {
+      return alreadyClosed();
+    }
+    // TODO: maybe open a JDK issue to have a separate, more
+    // meaningful exception for unmapped segments:
+    if (e.getMessage() != null && e.getMessage().contains("closed")) {
+      return alreadyClosed();
+    }
+    return e;
+  }
+  
+  // return value only for compiler!

Review comment:
       See here: https://github.com/apache/lucene-solr/pull/2176/files#diff-1c8534e50633cd8648f2cb3c886a269a6ebf8a57d817885bd962225d83f4e93fR450
   
   `byte` is fine for a method that can return `long`, `int`, `short`, `byte`,.... or `void`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] uschindler commented on a change in pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

uschindler commented on a change in pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#discussion_r551194238



##########
File path: lucene/core/src/java/org/apache/lucene/store/MemorySegmentIndexInput.java
##########
@@ -0,0 +1,525 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.store;
+
+import java.io.EOFException;
+import java.io.IOException;
+import java.lang.invoke.VarHandle;
+import java.nio.ByteOrder;
+import java.util.Arrays;
+import java.util.Objects;
+
+import org.apache.lucene.util.IOUtils;
+
+import jdk.incubator.foreign.MemoryHandles;
+import jdk.incubator.foreign.MemorySegment;
+
+/**
+ * Base IndexInput implementation that uses an array of MemorySegments to represent a file.
+ *
+ * <p>For efficiency, this class requires that the segment size are a power-of-two
+ * (<code>chunkSizePower</code>).
+ */
+public abstract class MemorySegmentIndexInput extends IndexInput implements RandomAccessInput {
+  // We pass 1L as alignment, because currently Lucene file formats are heavy unaligned: :(
+  static final VarHandle VH_getByte = MemoryHandles.varHandle(byte.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();
+  static final VarHandle VH_getShort = MemoryHandles.varHandle(short.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();
+  static final VarHandle VH_getInt = MemoryHandles.varHandle(int.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();
+  static final VarHandle VH_getLong = MemoryHandles.varHandle(long.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();
+  
+  final boolean isClone;
+  final long length;
+  final long chunkSizeMask;
+  final int chunkSizePower;
+  final MemorySegment[] segments;
+  
+  int curSegmentIndex = -1;
+  MemorySegment curSegment; // redundant for speed: segments[curSegmentIndex], also marker if closed!
+  long curPosition; // relative to curSegment, not globally
+
+  public static MemorySegmentIndexInput newInstance(
+      String resourceDescription,
+      MemorySegment[] segments,
+      long length,
+      int chunkSizePower) {
+    if (segments.length == 1) {
+      return new SingleSegmentImpl(resourceDescription, segments[0], length, chunkSizePower, false);
+    } else {
+      return new MultiSegmentImpl(resourceDescription, segments, 0, length, chunkSizePower, false);
+    }
+  }
+
+  private MemorySegmentIndexInput(
+      String resourceDescription,
+      MemorySegment[] segments,
+      long length,
+      int chunkSizePower,
+      boolean isClone) {
+    super(resourceDescription);
+    this.segments = segments;
+    this.length = length;
+    this.chunkSizePower = chunkSizePower;
+    this.chunkSizeMask = (1L << chunkSizePower) - 1L;
+    this.isClone = isClone;
+    this.curSegment = segments[0];
+  }
+  
+  void ensureOpen() {
+    if (curSegment == null) {
+      throw alreadyClosed();
+    }
+  }
+
+  RuntimeException wrapAlreadyClosedException(RuntimeException e) {
+    if (e instanceof NullPointerException) {
+      return alreadyClosed();
+    }
+    // TODO: maybe open a JDK issue to have a separate, more
+    // meaningful exception for unmapped segments:
+    if (e.getMessage() != null && e.getMessage().contains("closed")) {
+      return alreadyClosed();
+    }
+    return e;
+  }
+  
+  // return value only for compiler!
+  byte handlePositionalIOOBE(String action, long pos) throws IOException {

Review comment:
       The return also has same effect. Compiler understands that code is unreachable afterwards.
   
   Please read my explanation as reply to Sokolov's question. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] msokolov commented on pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

msokolov commented on pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#issuecomment-753491649


   Very exciting! Thank you for leading the way here, Uwe.
   
   > The most horrible thing I have seen is that our CFS file format starts the "inner" files totally unaligned. We should fix the CFSWriter to start new files always at multiples of 8 bytes. I will open an issue about this.
   
   I ran into this just yesterday as I was playing around with aligning vectors in their index files to see if any perf bump could be gained (the header seems to be 50 bytes usually, so some padding is needed). And then when I asserted alignment, realized that CFS was messing with it.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] uschindler commented on a change in pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

uschindler commented on a change in pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#discussion_r551193422



##########
File path: lucene/core/src/java/org/apache/lucene/store/MemorySegmentIndexInput.java
##########
@@ -0,0 +1,525 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.store;
+
+import java.io.EOFException;
+import java.io.IOException;
+import java.lang.invoke.VarHandle;
+import java.nio.ByteOrder;
+import java.util.Arrays;
+import java.util.Objects;
+
+import org.apache.lucene.util.IOUtils;
+
+import jdk.incubator.foreign.MemoryHandles;
+import jdk.incubator.foreign.MemorySegment;
+
+/**
+ * Base IndexInput implementation that uses an array of MemorySegments to represent a file.
+ *
+ * <p>For efficiency, this class requires that the segment size are a power-of-two
+ * (<code>chunkSizePower</code>).
+ */
+public abstract class MemorySegmentIndexInput extends IndexInput implements RandomAccessInput {
+  // We pass 1L as alignment, because currently Lucene file formats are heavy unaligned: :(
+  static final VarHandle VH_getByte = MemoryHandles.varHandle(byte.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();
+  static final VarHandle VH_getShort = MemoryHandles.varHandle(short.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();
+  static final VarHandle VH_getInt = MemoryHandles.varHandle(int.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();
+  static final VarHandle VH_getLong = MemoryHandles.varHandle(long.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();
+  
+  final boolean isClone;
+  final long length;
+  final long chunkSizeMask;
+  final int chunkSizePower;
+  final MemorySegment[] segments;
+  
+  int curSegmentIndex = -1;
+  MemorySegment curSegment; // redundant for speed: segments[curSegmentIndex], also marker if closed!
+  long curPosition; // relative to curSegment, not globally
+
+  public static MemorySegmentIndexInput newInstance(
+      String resourceDescription,
+      MemorySegment[] segments,
+      long length,
+      int chunkSizePower) {
+    if (segments.length == 1) {
+      return new SingleSegmentImpl(resourceDescription, segments[0], length, chunkSizePower, false);
+    } else {
+      return new MultiSegmentImpl(resourceDescription, segments, 0, length, chunkSizePower, false);
+    }
+  }
+
+  private MemorySegmentIndexInput(
+      String resourceDescription,
+      MemorySegment[] segments,
+      long length,
+      int chunkSizePower,
+      boolean isClone) {
+    super(resourceDescription);
+    this.segments = segments;
+    this.length = length;
+    this.chunkSizePower = chunkSizePower;
+    this.chunkSizeMask = (1L << chunkSizePower) - 1L;
+    this.isClone = isClone;
+    this.curSegment = segments[0];
+  }
+  
+  void ensureOpen() {
+    if (curSegment == null) {
+      throw alreadyClosed();
+    }
+  }
+
+  RuntimeException wrapAlreadyClosedException(RuntimeException e) {
+    if (e instanceof NullPointerException) {
+      return alreadyClosed();
+    }
+    // TODO: maybe open a JDK issue to have a separate, more
+    // meaningful exception for unmapped segments:
+    if (e.getMessage() != null && e.getMessage().contains("closed")) {
+      return alreadyClosed();
+    }
+    return e;
+  }
+  
+  // return value only for compiler!
+  byte handlePositionalIOOBE(String action, long pos) throws IOException {

Review comment:
       Won't work here (the other code paths do this).
   Here the problem is that it throws IOException or RuntimeRxception (IllegalArgumentEx extends RuntimeEx, EOFException extends checked IOException). There's no common Supertype. This is why option (2) was chosen here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] dweiss commented on pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

dweiss commented on pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#issuecomment-753673327


   See this, Mike:
   https://github.com/apache/lucene-solr/blob/master/help/jvms.txt


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] msokolov commented on pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

msokolov commented on pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#issuecomment-754153605


   I ran some vector search performance tests even though this isn't probably the most representative test, but since I have been running these for other reasons lately and I'm all geared up for it, I thought I would fire them up with these changes. I see a pronounced slowdown there, about 40% increase in latency for these KNN searches with this patch (I ran both conditions w/JDK16). Most of the workload from these searches are random-access reads of bytes from the index, conversion to float[], and computing dot products.
   
   I did recently discover that we were (inadvertently) reading and writing the `float[]` vectors that these searches are based on as big-endian, so perhaps if (when) we switch that (I have a pending patch), it would work better in conjunction with this??, although currently on master, at the `IndexInput` level it is just reading bytes on the heap and only afterwards converting those to `float[]`, so it doesn't seem likely there would be any interaction there; I just mention it since it's top of mind, and since I plan to add `DataInput.readFloats()`, which would be needed here too.
   
   Then I also ran luceneutil tests and see interesting mixed results:
   
   ```
                       TaskQPS baseline      StdDevQPS candidate      StdDev                Pct diff p-value
      BrowseMonthTaxoFacets        2.15      (7.7%)        1.19      (2.1%)  -44.5% ( -50% -  -37%) 0.000
   BrowseDayOfYearTaxoFacets        2.03      (8.1%)        1.16      (2.4%)  -43.0% ( -49% -  -35%) 0.000
       BrowseDateTaxoFacets        2.03      (8.2%)        1.16      (2.3%)  -42.9% ( -49% -  -35%) 0.000
                    Respell       52.33      (2.4%)       46.89      (1.7%)  -10.4% ( -14% -   -6%) 0.000
      BrowseMonthSSDVFacets       12.00     (13.2%)       10.81      (6.9%)  -10.0% ( -26% -   11%) 0.003
                   PKLookup      132.89      (2.3%)      120.09      (1.3%)   -9.6% ( -13% -   -6%) 0.000
                AndHighHigh       24.98      (3.0%)       22.61      (2.1%)   -9.5% ( -14% -   -4%) 0.000
            MedSloppyPhrase       15.66      (3.9%)       14.37      (2.0%)   -8.2% ( -13% -   -2%) 0.000
                LowSpanNear       96.34      (2.3%)       88.72      (1.6%)   -7.9% ( -11% -   -4%) 0.000
           HighSloppyPhrase       10.57      (4.6%)        9.75      (2.7%)   -7.7% ( -14% -    0%) 0.000
                    Prefix3       37.23      (7.2%)       34.53      (6.1%)   -7.2% ( -19% -    6%) 0.001
                MedSpanNear       15.63      (1.5%)       14.52      (2.0%)   -7.1% ( -10% -   -3%) 0.000
            LowSloppyPhrase       15.11      (4.4%)       14.06      (2.5%)   -6.9% ( -13% -    0%) 0.000
                 AndHighMed       72.87      (3.4%)       67.88      (3.0%)   -6.8% ( -12% -    0%) 0.000
                 AndHighLow      366.41      (3.2%)      342.99      (3.6%)   -6.4% ( -12% -    0%) 0.000
                  OrHighMed       69.00      (2.6%)       65.09      (2.7%)   -5.7% ( -10% -    0%) 0.000
                    LowTerm      930.90      (5.7%)      878.74      (4.3%)   -5.6% ( -14% -    4%) 0.000
                     Fuzzy2       46.43      (7.2%)       43.94      (5.4%)   -5.4% ( -16% -    7%) 0.008
                     Fuzzy1       64.31      (6.9%)       60.98      (6.5%)   -5.2% ( -17% -    8%) 0.014
               HighSpanNear       19.07      (1.7%)       18.12      (2.6%)   -5.0% (  -9% -    0%) 0.000
                  MedPhrase       21.45      (2.4%)       20.53      (1.9%)   -4.3% (  -8% -    0%) 0.000
       HighTermTitleBDVSort       73.32     (16.1%)       70.33     (17.0%)   -4.1% ( -32% -   34%) 0.437
                 OrHighHigh       16.82      (1.7%)       16.19      (1.9%)   -3.7% (  -7% -    0%) 0.000
   BrowseDayOfYearSSDVFacets        9.56      (8.7%)        9.24      (6.2%)   -3.4% ( -16% -   12%) 0.159
       HighIntervalsOrdered        9.02      (0.6%)        8.78      (0.7%)   -2.7% (  -4% -   -1%) 0.000
               OrNotHighMed      324.93      (2.7%)      316.20      (3.6%)   -2.7% (  -8% -    3%) 0.007
                   Wildcard      129.01      (2.5%)      126.91      (2.6%)   -1.6% (  -6% -    3%) 0.046
      HighTermDayOfYearSort      113.01     (11.9%)      111.68     (10.6%)   -1.2% ( -21% -   24%) 0.741
               OrNotHighLow      539.19      (2.8%)      534.10      (2.8%)   -0.9% (  -6% -    4%) 0.284
                  OrHighLow      314.01      (5.4%)      311.23      (5.9%)   -0.9% ( -11% -   11%) 0.620
          HighTermMonthSort       96.00      (9.6%)       95.48     (10.1%)   -0.5% ( -18% -   21%) 0.863
                 TermDTSort      100.42     (10.9%)      100.52     (13.4%)    0.1% ( -21% -   27%) 0.980
                  LowPhrase      259.52      (1.9%)      262.37      (2.2%)    1.1% (  -2% -    5%) 0.091
                   HighTerm      872.57      (3.9%)      892.07      (3.4%)    2.2% (  -4% -    9%) 0.052
                    MedTerm     1003.36      (4.8%)     1029.34      (4.8%)    2.6% (  -6% -   12%) 0.086
                 HighPhrase      220.30      (1.9%)      228.63      (3.3%)    3.8% (  -1% -    9%) 0.000
              OrHighNotHigh      372.34      (3.4%)      391.32      (4.2%)    5.1% (  -2% -   13%) 0.000
              OrNotHighHigh      389.61      (2.7%)      415.68      (5.5%)    6.7% (  -1% -   15%) 0.000
               OrHighNotMed      409.35      (1.9%)      445.76      (6.9%)    8.9% (   0% -   18%) 0.000
               OrHighNotLow      513.64      (4.2%)      561.96      (7.9%)    9.4% (  -2% -   22%) 0.000
                     IntNRQ      203.68      (2.3%)      224.37      (4.7%)   10.2% (   3% -   17%) 0.000
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] uschindler commented on pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

uschindler commented on pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#issuecomment-754304561


   This would also explain slowdowns on *short* posting-list iterations, as readLELongs() with smaller number of longs is not working well.
   
   My plan would be to do an if statement at beginning of readBytes() or readLELongs() like:
   ```
   if (length < SOME_LIMIT) {
     for (int i = offset; i< offset + length; i++) {
       bytes[i] = readByte();
     }
   } else {
     ... current code...
   }
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] uschindler commented on a change in pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

uschindler commented on a change in pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#discussion_r551195154



##########
File path: gradle/defaults-java.gradle
##########
@@ -60,3 +60,14 @@ allprojects {
     }
   }
 }
+
+configure(project(":lucene:core")) {

Review comment:
       The main reason for this PR is testing and feedback to JDK team.
   Jenkins tests it and we can feedback. 4 issues already reported.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] uschindler edited a comment on pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

uschindler edited a comment on pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#issuecomment-753503210


   @dweiss I need your help here. There is one thing that drives me crazy: with the changes (actually adding `--add-modules jdk.incubator.foreign` to the test command line, JUnit 4 suddely fails for many "non-tests" with exceptions like this.
   
   This seems to have to do with some automatism by Gradle or by JUnit once it detects the "Java module system", and I have no idea how to turn it off. What I am completely wondering about: why does it load those classes at all? I was trying to find the good old "Ant fileset" that has this `**/*Test.class,**/Test*.class` pattern, but this is no longer there with gradle. How to readd it, if JUnit/Gradle seems to think that it needs some (broken) module system API.
   
   Unless this is fixed, I can't run all tests easily. I only ran them per module and ignored the tons of stack traces. PLEASE HELP!
   
   ```
   org.apache.lucene.backward_codecs.lucene60.Lucene60PointsWriter > initializationError FAILED
       java.lang.IllegalArgumentException: Test class can only have one constructor
           at org.junit.runners.model.TestClass.<init>(TestClass.java:48)
           at org.junit.runners.JUnit4.<init>(JUnit4.java:23)
           at org.junit.internal.builders.JUnit4Builder.runnerForClass(JUnit4Builder.java:10)
           at org.junit.runners.model.RunnerBuilder.safeRunnerForClass(RunnerBuilder.java:70)
           at org.junit.internal.builders.AllDefaultPossibilitiesBuilder.runnerForClass(AllDefaultPossibilitiesBuilder.java:37)
           at org.junit.runners.model.RunnerBuilder.safeRunnerForClass(RunnerBuilder.java:70)
           at org.junit.internal.requests.ClassRequest.createRunner(ClassRequest.java:28)
           at org.junit.internal.requests.MemoizingRequest.getRunner(MemoizingRequest.java:19)
           at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:78)
           at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
           at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
           at org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
           at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
           at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:78)
           at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.base/java.lang.reflect.Method.invoke(Method.java:567)
           at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36)
           at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
           at org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:33)
           at org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:94)
           at jdk.proxy1/jdk.proxy1.$Proxy2.processTestClass(Unknown Source)
           at org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:119)
           at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:78)
           at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.base/java.lang.reflect.Method.invoke(Method.java:567)
           at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36)
           at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
           at org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:182)
           at org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:164)
           at org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:414)
           at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:64)
           at org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:48)
           at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
           at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
           at org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:56)
           at java.base/java.lang.Thread.run(Thread.java:831)
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] uschindler commented on a change in pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

uschindler commented on a change in pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#discussion_r551197902



##########
File path: lucene/core/src/java/org/apache/lucene/store/MemorySegmentIndexInput.java
##########
@@ -0,0 +1,525 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.store;
+
+import java.io.EOFException;
+import java.io.IOException;
+import java.lang.invoke.VarHandle;
+import java.nio.ByteOrder;
+import java.util.Arrays;
+import java.util.Objects;
+
+import org.apache.lucene.util.IOUtils;
+
+import jdk.incubator.foreign.MemoryHandles;
+import jdk.incubator.foreign.MemorySegment;
+
+/**
+ * Base IndexInput implementation that uses an array of MemorySegments to represent a file.
+ *
+ * <p>For efficiency, this class requires that the segment size are a power-of-two
+ * (<code>chunkSizePower</code>).
+ */
+public abstract class MemorySegmentIndexInput extends IndexInput implements RandomAccessInput {
+  // We pass 1L as alignment, because currently Lucene file formats are heavy unaligned: :(
+  static final VarHandle VH_getByte = MemoryHandles.varHandle(byte.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();
+  static final VarHandle VH_getShort = MemoryHandles.varHandle(short.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();
+  static final VarHandle VH_getInt = MemoryHandles.varHandle(int.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();
+  static final VarHandle VH_getLong = MemoryHandles.varHandle(long.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();
+  
+  final boolean isClone;
+  final long length;
+  final long chunkSizeMask;
+  final int chunkSizePower;
+  final MemorySegment[] segments;
+  
+  int curSegmentIndex = -1;
+  MemorySegment curSegment; // redundant for speed: segments[curSegmentIndex], also marker if closed!
+  long curPosition; // relative to curSegment, not globally
+
+  public static MemorySegmentIndexInput newInstance(
+      String resourceDescription,
+      MemorySegment[] segments,
+      long length,
+      int chunkSizePower) {
+    if (segments.length == 1) {
+      return new SingleSegmentImpl(resourceDescription, segments[0], length, chunkSizePower, false);
+    } else {
+      return new MultiSegmentImpl(resourceDescription, segments, 0, length, chunkSizePower, false);
+    }
+  }
+
+  private MemorySegmentIndexInput(
+      String resourceDescription,
+      MemorySegment[] segments,
+      long length,
+      int chunkSizePower,
+      boolean isClone) {
+    super(resourceDescription);
+    this.segments = segments;
+    this.length = length;
+    this.chunkSizePower = chunkSizePower;
+    this.chunkSizeMask = (1L << chunkSizePower) - 1L;
+    this.isClone = isClone;
+    this.curSegment = segments[0];
+  }
+  
+  void ensureOpen() {
+    if (curSegment == null) {
+      throw alreadyClosed();
+    }
+  }
+
+  RuntimeException wrapAlreadyClosedException(RuntimeException e) {
+    if (e instanceof NullPointerException) {
+      return alreadyClosed();
+    }
+    // TODO: maybe open a JDK issue to have a separate, more
+    // meaningful exception for unmapped segments:
+    if (e.getMessage() != null && e.getMessage().contains("closed")) {
+      return alreadyClosed();
+    }
+    return e;
+  }
+  
+  // return value only for compiler!
+  byte handlePositionalIOOBE(String action, long pos) throws IOException {

Review comment:
       Ah now I understand... You declare it to return RuntimeException, but never return it. Instead throw exception directly, return type is just a fake.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] msokolov commented on pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

msokolov commented on pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#issuecomment-754627642


   Uwe, I didn't think that IndexInput would expose its internal ByteBuffers easily? But you are right about the double-copying - that is why I opened https://issues.apache.org/jira/browse/LUCENE-9652. BTW I did test that change with and without this one and saw a similar slowdown there. Possibly a loop with a single float at a time would be better? It's highly counterintutitive to me, but when I get a moment, I will try it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] uschindler edited a comment on pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

uschindler edited a comment on pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#issuecomment-753492065


   > Very exciting! Thank you for leading the way here, Uwe.
   > 
   >> The most horrible thing I have seen is that our CFS file format starts the "inner" files totally unaligned. We should fix the CFSWriter to start new files always at multiples of 8 bytes. I will open an issue about this.
   >
   > I ran into this just yesterday as I was playing around with aligning vectors in their index files to see if any perf bump could be gained (the header seems to be 50 bytes usually, so some padding is needed). And then when I asserted alignment, realized that CFS was messing with it.
   
   In fact for CFS file we don't even need to change the file format / version number. We just have to make sure that CFS writer starts new inner files aligned to 8 bytes. That should be easy to implement.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] uschindler commented on pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

uschindler commented on pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#issuecomment-753650342


   After some cleanup, I also added a workaround for https://bugs.openjdk.java.net/browse/JDK-8259028: In MMapDirectory we try to unwrap the path using reflection. This is for sure a hackydihickhack workaround, but works until this is fixed -- and now all tests pass for me! :-)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] uschindler commented on a change in pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

uschindler commented on a change in pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#discussion_r551061081



##########
File path: lucene/core/src/java/org/apache/lucene/store/MemorySegmentIndexInput.java
##########
@@ -0,0 +1,525 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.store;
+
+import java.io.EOFException;
+import java.io.IOException;
+import java.lang.invoke.VarHandle;
+import java.nio.ByteOrder;
+import java.util.Arrays;
+import java.util.Objects;
+
+import org.apache.lucene.util.IOUtils;
+
+import jdk.incubator.foreign.MemoryHandles;
+import jdk.incubator.foreign.MemorySegment;
+
+/**
+ * Base IndexInput implementation that uses an array of MemorySegments to represent a file.
+ *
+ * <p>For efficiency, this class requires that the segment size are a power-of-two
+ * (<code>chunkSizePower</code>).
+ */
+public abstract class MemorySegmentIndexInput extends IndexInput implements RandomAccessInput {
+  // We pass 1L as alignment, because currently Lucene file formats are heavy unaligned: :(
+  static final VarHandle VH_getByte = MemoryHandles.varHandle(byte.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();
+  static final VarHandle VH_getShort = MemoryHandles.varHandle(short.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();
+  static final VarHandle VH_getInt = MemoryHandles.varHandle(int.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();
+  static final VarHandle VH_getLong = MemoryHandles.varHandle(long.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();
+  
+  final boolean isClone;
+  final long length;
+  final long chunkSizeMask;
+  final int chunkSizePower;
+  final MemorySegment[] segments;
+  
+  int curSegmentIndex = -1;
+  MemorySegment curSegment; // redundant for speed: segments[curSegmentIndex], also marker if closed!
+  long curPosition; // relative to curSegment, not globally
+
+  public static MemorySegmentIndexInput newInstance(
+      String resourceDescription,
+      MemorySegment[] segments,
+      long length,
+      int chunkSizePower) {
+    if (segments.length == 1) {
+      return new SingleSegmentImpl(resourceDescription, segments[0], length, chunkSizePower, false);
+    } else {
+      return new MultiSegmentImpl(resourceDescription, segments, 0, length, chunkSizePower, false);
+    }
+  }
+
+  private MemorySegmentIndexInput(
+      String resourceDescription,
+      MemorySegment[] segments,
+      long length,
+      int chunkSizePower,
+      boolean isClone) {
+    super(resourceDescription);
+    this.segments = segments;
+    this.length = length;
+    this.chunkSizePower = chunkSizePower;
+    this.chunkSizeMask = (1L << chunkSizePower) - 1L;
+    this.isClone = isClone;
+    this.curSegment = segments[0];
+  }
+  
+  void ensureOpen() {
+    if (curSegment == null) {
+      throw alreadyClosed();
+    }
+  }
+
+  RuntimeException wrapAlreadyClosedException(RuntimeException e) {
+    if (e instanceof NullPointerException) {
+      return alreadyClosed();
+    }
+    // TODO: maybe open a JDK issue to have a separate, more
+    // meaningful exception for unmapped segments:
+    if (e.getMessage() != null && e.getMessage().contains("closed")) {
+      return alreadyClosed();
+    }
+    return e;
+  }
+  
+  // return value only for compiler!

Review comment:
       no, we need some return value so we can call it and return a value. But as it is throwing an exception anyways, the return value does not matter.
   
   These tricks are often used for exception handlers to delegate "exception handling" to a method where you know it will never return and just bubble up the exception:
   - if the exception is known before (by type), you use `throw callSomeMethod(param)`. The pro thing: it's clean as you delegate the instantiation of the exception to the factory, but throw it on your own. The problem here is: only works with a known exception type - and a single one, because the method needs it as return type. A generic `Exception` is a desaster, as it requires to declare that on throws! 
   - an alternative is (and that's used here): put the throw clause also into the method, so the only sense of our method is throwing some exception, it never returns cleanly. Backside: the method needs a return type, so you can do: `return callSomeMethod()` from the caller (to make the compiler understand that we exit the method here - and always). Of course you can add a separate return statement after the method call, but that looks more bullshitty.
   
   Here we use the second approch. The most common return type to all callers of this method is `byte`. Compiler is happy, voila!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] uschindler edited a comment on pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

uschindler edited a comment on pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#issuecomment-753673096


   Hi read the instructions above, at "be aware". Gradle is not compatible to JDK 16 at all. So Gradle needs to run with JDK 11. You can pass the JDK 16 directory using an environment variable or through sysprop. Gradlew shows all options when you run it without any options and read instructions about alternative jvms.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] msokolov commented on pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

msokolov commented on pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#issuecomment-753672225


   Hi Uwe - just trying to get this patch working here; when I try to compile (`gradlew lucene:core:jar`) w/JDK 11 I get this: `Could not target platform: 'Java SE 16' using tool chain: 'JDK 11 (11)'`. So I tried w/JDK16 (hmm might not be the right version?) and get `java.lang.IllegalAccessError: class org.gradle.internal.compiler.java.ClassNameCollector (in unnamed module @0x772989a6) cannot access class com.sun.tools.javac.code.Symbol$TypeSymbol (in module jdk.compiler) because module jdk.compiler does not export com.sun.tools.javac.code to unnamed module @0x772989a6`. Yeah I have 16-ea+30-2130 - i think that was the version you specified?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] msokolov commented on a change in pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

msokolov commented on a change in pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#discussion_r551059214



##########
File path: lucene/core/src/java/org/apache/lucene/store/MemorySegmentIndexInput.java
##########
@@ -0,0 +1,525 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.store;
+
+import java.io.EOFException;
+import java.io.IOException;
+import java.lang.invoke.VarHandle;
+import java.nio.ByteOrder;
+import java.util.Arrays;
+import java.util.Objects;
+
+import org.apache.lucene.util.IOUtils;
+
+import jdk.incubator.foreign.MemoryHandles;
+import jdk.incubator.foreign.MemorySegment;
+
+/**
+ * Base IndexInput implementation that uses an array of MemorySegments to represent a file.
+ *
+ * <p>For efficiency, this class requires that the segment size are a power-of-two
+ * (<code>chunkSizePower</code>).
+ */
+public abstract class MemorySegmentIndexInput extends IndexInput implements RandomAccessInput {
+  // We pass 1L as alignment, because currently Lucene file formats are heavy unaligned: :(
+  static final VarHandle VH_getByte = MemoryHandles.varHandle(byte.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();
+  static final VarHandle VH_getShort = MemoryHandles.varHandle(short.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();
+  static final VarHandle VH_getInt = MemoryHandles.varHandle(int.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();
+  static final VarHandle VH_getLong = MemoryHandles.varHandle(long.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();
+  
+  final boolean isClone;
+  final long length;
+  final long chunkSizeMask;
+  final int chunkSizePower;
+  final MemorySegment[] segments;
+  
+  int curSegmentIndex = -1;
+  MemorySegment curSegment; // redundant for speed: segments[curSegmentIndex], also marker if closed!
+  long curPosition; // relative to curSegment, not globally
+
+  public static MemorySegmentIndexInput newInstance(
+      String resourceDescription,
+      MemorySegment[] segments,
+      long length,
+      int chunkSizePower) {
+    if (segments.length == 1) {
+      return new SingleSegmentImpl(resourceDescription, segments[0], length, chunkSizePower, false);
+    } else {
+      return new MultiSegmentImpl(resourceDescription, segments, 0, length, chunkSizePower, false);
+    }
+  }
+
+  private MemorySegmentIndexInput(
+      String resourceDescription,
+      MemorySegment[] segments,
+      long length,
+      int chunkSizePower,
+      boolean isClone) {
+    super(resourceDescription);
+    this.segments = segments;
+    this.length = length;
+    this.chunkSizePower = chunkSizePower;
+    this.chunkSizeMask = (1L << chunkSizePower) - 1L;
+    this.isClone = isClone;
+    this.curSegment = segments[0];
+  }
+  
+  void ensureOpen() {
+    if (curSegment == null) {
+      throw alreadyClosed();
+    }
+  }
+
+  RuntimeException wrapAlreadyClosedException(RuntimeException e) {
+    if (e instanceof NullPointerException) {
+      return alreadyClosed();
+    }
+    // TODO: maybe open a JDK issue to have a separate, more
+    // meaningful exception for unmapped segments:
+    if (e.getMessage() != null && e.getMessage().contains("closed")) {
+      return alreadyClosed();
+    }
+    return e;
+  }
+  
+  // return value only for compiler!

Review comment:
       Is that a JDK16 thing? It thinks this is a no-op? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] uschindler edited a comment on pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

uschindler edited a comment on pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#issuecomment-753512086


   FYI, after I was able to run tests in a more convenient way, I figured out that some tests fail from time to time because of using `HandleTrackingFS`. I disabled all custom filesystems in the PR, but it looks like some tests use `HandleTrackingFS`.
   
   I forgot to mention this in my description above: MMapDirectory no longer works with custom `java.nio.filesystem.FileSystem`, as it tries to cast the `FileChannel` returned by our custom filesystem to the JDK-internal class (see JDK issue: https://bugs.openjdk.java.net/browse/JDK-8259028). For now we have to disable all custom file systems in our test suite until this bug is fixed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] dweiss commented on a change in pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

dweiss commented on a change in pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#discussion_r551198238



##########
File path: lucene/core/src/java/org/apache/lucene/store/MemorySegmentIndexInput.java
##########
@@ -0,0 +1,525 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.store;
+
+import java.io.EOFException;
+import java.io.IOException;
+import java.lang.invoke.VarHandle;
+import java.nio.ByteOrder;
+import java.util.Arrays;
+import java.util.Objects;
+
+import org.apache.lucene.util.IOUtils;
+
+import jdk.incubator.foreign.MemoryHandles;
+import jdk.incubator.foreign.MemorySegment;
+
+/**
+ * Base IndexInput implementation that uses an array of MemorySegments to represent a file.
+ *
+ * <p>For efficiency, this class requires that the segment size are a power-of-two
+ * (<code>chunkSizePower</code>).
+ */
+public abstract class MemorySegmentIndexInput extends IndexInput implements RandomAccessInput {
+  // We pass 1L as alignment, because currently Lucene file formats are heavy unaligned: :(
+  static final VarHandle VH_getByte = MemoryHandles.varHandle(byte.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();
+  static final VarHandle VH_getShort = MemoryHandles.varHandle(short.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();
+  static final VarHandle VH_getInt = MemoryHandles.varHandle(int.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();
+  static final VarHandle VH_getLong = MemoryHandles.varHandle(long.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();
+  
+  final boolean isClone;
+  final long length;
+  final long chunkSizeMask;
+  final int chunkSizePower;
+  final MemorySegment[] segments;
+  
+  int curSegmentIndex = -1;
+  MemorySegment curSegment; // redundant for speed: segments[curSegmentIndex], also marker if closed!
+  long curPosition; // relative to curSegment, not globally
+
+  public static MemorySegmentIndexInput newInstance(
+      String resourceDescription,
+      MemorySegment[] segments,
+      long length,
+      int chunkSizePower) {
+    if (segments.length == 1) {
+      return new SingleSegmentImpl(resourceDescription, segments[0], length, chunkSizePower, false);
+    } else {
+      return new MultiSegmentImpl(resourceDescription, segments, 0, length, chunkSizePower, false);
+    }
+  }
+
+  private MemorySegmentIndexInput(
+      String resourceDescription,
+      MemorySegment[] segments,
+      long length,
+      int chunkSizePower,
+      boolean isClone) {
+    super(resourceDescription);
+    this.segments = segments;
+    this.length = length;
+    this.chunkSizePower = chunkSizePower;
+    this.chunkSizeMask = (1L << chunkSizePower) - 1L;
+    this.isClone = isClone;
+    this.curSegment = segments[0];
+  }
+  
+  void ensureOpen() {
+    if (curSegment == null) {
+      throw alreadyClosed();
+    }
+  }
+
+  RuntimeException wrapAlreadyClosedException(RuntimeException e) {
+    if (e instanceof NullPointerException) {
+      return alreadyClosed();
+    }
+    // TODO: maybe open a JDK issue to have a separate, more
+    // meaningful exception for unmapped segments:
+    if (e.getMessage() != null && e.getMessage().contains("closed")) {
+      return alreadyClosed();
+    }
+    return e;
+  }
+  
+  // return value only for compiler!
+  byte handlePositionalIOOBE(String action, long pos) throws IOException {

Review comment:
       Precisely.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] dweiss commented on a change in pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

dweiss commented on a change in pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#discussion_r551196614



##########
File path: lucene/core/src/java/org/apache/lucene/store/MemorySegmentIndexInput.java
##########
@@ -0,0 +1,525 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.store;
+
+import java.io.EOFException;
+import java.io.IOException;
+import java.lang.invoke.VarHandle;
+import java.nio.ByteOrder;
+import java.util.Arrays;
+import java.util.Objects;
+
+import org.apache.lucene.util.IOUtils;
+
+import jdk.incubator.foreign.MemoryHandles;
+import jdk.incubator.foreign.MemorySegment;
+
+/**
+ * Base IndexInput implementation that uses an array of MemorySegments to represent a file.
+ *
+ * <p>For efficiency, this class requires that the segment size are a power-of-two
+ * (<code>chunkSizePower</code>).
+ */
+public abstract class MemorySegmentIndexInput extends IndexInput implements RandomAccessInput {
+  // We pass 1L as alignment, because currently Lucene file formats are heavy unaligned: :(
+  static final VarHandle VH_getByte = MemoryHandles.varHandle(byte.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();
+  static final VarHandle VH_getShort = MemoryHandles.varHandle(short.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();
+  static final VarHandle VH_getInt = MemoryHandles.varHandle(int.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();
+  static final VarHandle VH_getLong = MemoryHandles.varHandle(long.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();
+  
+  final boolean isClone;
+  final long length;
+  final long chunkSizeMask;
+  final int chunkSizePower;
+  final MemorySegment[] segments;
+  
+  int curSegmentIndex = -1;
+  MemorySegment curSegment; // redundant for speed: segments[curSegmentIndex], also marker if closed!
+  long curPosition; // relative to curSegment, not globally
+
+  public static MemorySegmentIndexInput newInstance(
+      String resourceDescription,
+      MemorySegment[] segments,
+      long length,
+      int chunkSizePower) {
+    if (segments.length == 1) {
+      return new SingleSegmentImpl(resourceDescription, segments[0], length, chunkSizePower, false);
+    } else {
+      return new MultiSegmentImpl(resourceDescription, segments, 0, length, chunkSizePower, false);
+    }
+  }
+
+  private MemorySegmentIndexInput(
+      String resourceDescription,
+      MemorySegment[] segments,
+      long length,
+      int chunkSizePower,
+      boolean isClone) {
+    super(resourceDescription);
+    this.segments = segments;
+    this.length = length;
+    this.chunkSizePower = chunkSizePower;
+    this.chunkSizeMask = (1L << chunkSizePower) - 1L;
+    this.isClone = isClone;
+    this.curSegment = segments[0];
+  }
+  
+  void ensureOpen() {
+    if (curSegment == null) {
+      throw alreadyClosed();
+    }
+  }
+
+  RuntimeException wrapAlreadyClosedException(RuntimeException e) {
+    if (e instanceof NullPointerException) {
+      return alreadyClosed();
+    }
+    // TODO: maybe open a JDK issue to have a separate, more
+    // meaningful exception for unmapped segments:
+    if (e.getMessage() != null && e.getMessage().contains("closed")) {
+      return alreadyClosed();
+    }
+    return e;
+  }
+  
+  // return value only for compiler!
+  byte handlePositionalIOOBE(String action, long pos) throws IOException {

Review comment:
        I have. But I think the throws conveys it better :); the difference exists at call sites like this one:
   ```
       } catch (IndexOutOfBoundsException e) {
         handlePositionalIOOBE("seek", pos);
       }
   ```
   
   The compiler would allow you to add code after the catch (or after handlePositionalOOBE). Whereas this would have the desired effect (also showing up-front that the method never returns normally):
   ```
       } catch (IndexOutOfBoundsException e) {
         throw handlePositionalIOOBE("seek", pos);
       }
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] uschindler commented on pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

uschindler commented on pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#issuecomment-755789343


   FYI, I added some interface in oal.util.Unwrapable that can be implemented by our mocking layers in test-framework. The new interface allows us to unwrap the Path without knowing about test-framework internals. IMHO, we should use this also for other mocking layers, so we can easy unwrap them with some consistent API.
   
   The remaining 2 issues in MappedByteBuffer.mapFile() were merged today and as soon as those are part of another preview build of JDK 16, we can remove the mapFileBugfix() method.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] uschindler commented on a change in pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

uschindler commented on a change in pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#discussion_r551208675



##########
File path: gradle/defaults-java.gradle
##########
@@ -60,3 +60,14 @@ allprojects {
     }
   }
 }
+
+configure(project(":lucene:core")) {

Review comment:
       To come back to the MR-JAR: Yes we may need it, but it can be solved without, too.
   
   The IndexInputs are completely separate (so they can life side-by side). The changes to MMapDircetory are mainly a change from int to long for the chunk size (this is why I also not yet removed the methods to query unmapping status, thy would always return true for JDK>xxxx.
   
   When we would have this productive, MMapDirectory would just return one of the IndexInputs that fits your JDK. We may only need to move the code that maps a file to ByteBuffer or MemorySegment to a separate class or maybe the ByteBuffer-/MemorySegmentIndexInput class. Then we have 2 classes staying next to each other, one for JDK 11 and one for JDK 16/17/18+, MMapDirectory just chooses the correct one. Then we won't need a MR JAR, only some tricks to compile a few classes with later `--release`.
   
   With compilation we can use `--release` and of course a newer Gradle version. Like Elasticsearch, people who want to compile Lucene/Solr would need JDK 17, but the code is working for JDK 11+. Elasticsearch does exactly the same: It is tested and compiled for Java 8, but to build it you need JDK 15.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] msokolov commented on pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

msokolov commented on pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#issuecomment-754300727


   > Just to clarify: In your code, you are reading the floats with a series of readInt() from the indexinput, converting them to float?
   
   No, the floats are read using `readBytes()` into a `ByteBuffer` and then converted using a derived `FloatBuffer`. So I think your intuition about `readBytes()` is correct?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] dweiss commented on a change in pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

dweiss commented on a change in pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#discussion_r551196614



##########
File path: lucene/core/src/java/org/apache/lucene/store/MemorySegmentIndexInput.java
##########
@@ -0,0 +1,525 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.store;
+
+import java.io.EOFException;
+import java.io.IOException;
+import java.lang.invoke.VarHandle;
+import java.nio.ByteOrder;
+import java.util.Arrays;
+import java.util.Objects;
+
+import org.apache.lucene.util.IOUtils;
+
+import jdk.incubator.foreign.MemoryHandles;
+import jdk.incubator.foreign.MemorySegment;
+
+/**
+ * Base IndexInput implementation that uses an array of MemorySegments to represent a file.
+ *
+ * <p>For efficiency, this class requires that the segment size are a power-of-two
+ * (<code>chunkSizePower</code>).
+ */
+public abstract class MemorySegmentIndexInput extends IndexInput implements RandomAccessInput {
+  // We pass 1L as alignment, because currently Lucene file formats are heavy unaligned: :(
+  static final VarHandle VH_getByte = MemoryHandles.varHandle(byte.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();
+  static final VarHandle VH_getShort = MemoryHandles.varHandle(short.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();
+  static final VarHandle VH_getInt = MemoryHandles.varHandle(int.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();
+  static final VarHandle VH_getLong = MemoryHandles.varHandle(long.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();
+  
+  final boolean isClone;
+  final long length;
+  final long chunkSizeMask;
+  final int chunkSizePower;
+  final MemorySegment[] segments;
+  
+  int curSegmentIndex = -1;
+  MemorySegment curSegment; // redundant for speed: segments[curSegmentIndex], also marker if closed!
+  long curPosition; // relative to curSegment, not globally
+
+  public static MemorySegmentIndexInput newInstance(
+      String resourceDescription,
+      MemorySegment[] segments,
+      long length,
+      int chunkSizePower) {
+    if (segments.length == 1) {
+      return new SingleSegmentImpl(resourceDescription, segments[0], length, chunkSizePower, false);
+    } else {
+      return new MultiSegmentImpl(resourceDescription, segments, 0, length, chunkSizePower, false);
+    }
+  }
+
+  private MemorySegmentIndexInput(
+      String resourceDescription,
+      MemorySegment[] segments,
+      long length,
+      int chunkSizePower,
+      boolean isClone) {
+    super(resourceDescription);
+    this.segments = segments;
+    this.length = length;
+    this.chunkSizePower = chunkSizePower;
+    this.chunkSizeMask = (1L << chunkSizePower) - 1L;
+    this.isClone = isClone;
+    this.curSegment = segments[0];
+  }
+  
+  void ensureOpen() {
+    if (curSegment == null) {
+      throw alreadyClosed();
+    }
+  }
+
+  RuntimeException wrapAlreadyClosedException(RuntimeException e) {
+    if (e instanceof NullPointerException) {
+      return alreadyClosed();
+    }
+    // TODO: maybe open a JDK issue to have a separate, more
+    // meaningful exception for unmapped segments:
+    if (e.getMessage() != null && e.getMessage().contains("closed")) {
+      return alreadyClosed();
+    }
+    return e;
+  }
+  
+  // return value only for compiler!
+  byte handlePositionalIOOBE(String action, long pos) throws IOException {

Review comment:
        I have. But I think the throws conveys it better :); the difference exists at call sites like this one:
   {code}
       } catch (IndexOutOfBoundsException e) {
         handlePositionalIOOBE("seek", pos);
       }
   {code}
   
   The compiler would allow you to add code after the catch (or after handlePositionalOOBE). Whereas this would have the desired effect (also showing up-front that the method never returns normally):
   {code}
       } catch (IndexOutOfBoundsException e) {
         throw handlePositionalIOOBE("seek", pos);
       }
   {code}




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] uschindler commented on a change in pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

uschindler commented on a change in pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#discussion_r551061081



##########
File path: lucene/core/src/java/org/apache/lucene/store/MemorySegmentIndexInput.java
##########
@@ -0,0 +1,525 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.store;
+
+import java.io.EOFException;
+import java.io.IOException;
+import java.lang.invoke.VarHandle;
+import java.nio.ByteOrder;
+import java.util.Arrays;
+import java.util.Objects;
+
+import org.apache.lucene.util.IOUtils;
+
+import jdk.incubator.foreign.MemoryHandles;
+import jdk.incubator.foreign.MemorySegment;
+
+/**
+ * Base IndexInput implementation that uses an array of MemorySegments to represent a file.
+ *
+ * <p>For efficiency, this class requires that the segment size are a power-of-two
+ * (<code>chunkSizePower</code>).
+ */
+public abstract class MemorySegmentIndexInput extends IndexInput implements RandomAccessInput {
+  // We pass 1L as alignment, because currently Lucene file formats are heavy unaligned: :(
+  static final VarHandle VH_getByte = MemoryHandles.varHandle(byte.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();
+  static final VarHandle VH_getShort = MemoryHandles.varHandle(short.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();
+  static final VarHandle VH_getInt = MemoryHandles.varHandle(int.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();
+  static final VarHandle VH_getLong = MemoryHandles.varHandle(long.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();
+  
+  final boolean isClone;
+  final long length;
+  final long chunkSizeMask;
+  final int chunkSizePower;
+  final MemorySegment[] segments;
+  
+  int curSegmentIndex = -1;
+  MemorySegment curSegment; // redundant for speed: segments[curSegmentIndex], also marker if closed!
+  long curPosition; // relative to curSegment, not globally
+
+  public static MemorySegmentIndexInput newInstance(
+      String resourceDescription,
+      MemorySegment[] segments,
+      long length,
+      int chunkSizePower) {
+    if (segments.length == 1) {
+      return new SingleSegmentImpl(resourceDescription, segments[0], length, chunkSizePower, false);
+    } else {
+      return new MultiSegmentImpl(resourceDescription, segments, 0, length, chunkSizePower, false);
+    }
+  }
+
+  private MemorySegmentIndexInput(
+      String resourceDescription,
+      MemorySegment[] segments,
+      long length,
+      int chunkSizePower,
+      boolean isClone) {
+    super(resourceDescription);
+    this.segments = segments;
+    this.length = length;
+    this.chunkSizePower = chunkSizePower;
+    this.chunkSizeMask = (1L << chunkSizePower) - 1L;
+    this.isClone = isClone;
+    this.curSegment = segments[0];
+  }
+  
+  void ensureOpen() {
+    if (curSegment == null) {
+      throw alreadyClosed();
+    }
+  }
+
+  RuntimeException wrapAlreadyClosedException(RuntimeException e) {
+    if (e instanceof NullPointerException) {
+      return alreadyClosed();
+    }
+    // TODO: maybe open a JDK issue to have a separate, more
+    // meaningful exception for unmapped segments:
+    if (e.getMessage() != null && e.getMessage().contains("closed")) {
+      return alreadyClosed();
+    }
+    return e;
+  }
+  
+  // return value only for compiler!

Review comment:
       no, we need some return value so we can call it and return a value. But as it is throwing an exception anyways, the return value does not matter.
   
   These tricks are often used for exception handlers to delegate "exception handling" to a method where you know it will never return and just bubble up the exception:
   - if the exception is known before (by type), you use `throw callSomeMethod(param)`. The pro thing: it's clean as you delegate the instantiation of the exception to the factory, but throw it on your own. The problem here is: only works with a known exception type - and a single one, because the method needs it as return type. A generic `Exception` is a desaster, as it reuires to declare that on throws! 
   - an alternative is (and that's used here): put the throw clause also into the method, so the only sense of our method is throwing some exception, it never returns cleanly. Backside: the method needs a return type, so you can do: `return callSomeMethod()` from the caller. Of course you can add a separate return statement after the method call, but that looks more bullshitty.
   
   Here we use the second approch. The most common return type to all callers of this method is `byte`. Compiler is happy, voila!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] uschindler edited a comment on pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

uschindler edited a comment on pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#issuecomment-754299385

Hi @msokolov,
thanks for the speed test - I wanted to wait for the specialists to do it. As said before, we should report slowness (or improvements) to the Hotpsot/OpenJDK people. I was expecting a result like this due to the early stage of the development and misisng Hotspot optimizations. Just to clarify: In your code, you are reading the floats with a series of readInt() from the indexinput, converting them to float?
If that's the case, I would say: this is expected! From what I figured out on the mailing lists: loops using `long` as an index vs. `int` as an index are not supported well by Hotspot yet. The position() argument of good old ByteBuffers is a 32bits `int` and whenever you read from memory it uses this int (`index += 4`). Those loops are easy optimized. With `MemorySegment`s, the pointer is a 64bit `long` named `curPosition` in our `MemorySegmentIndexInput`. Hotspot has no idea how to optimize that loop of readInt() statements! I am not sure if this is fully true at this stage but that's my first guess, I will discuss this with Maurizio.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] uschindler commented on a change in pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

uschindler commented on a change in pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#discussion_r551208675



##########
File path: gradle/defaults-java.gradle
##########
@@ -60,3 +60,14 @@ allprojects {
     }
   }
 }
+
+configure(project(":lucene:core")) {

Review comment:
       To come back to the MR-JAR: Yes we may need it, but it can be solved without, too.
   
   The IndexInputs are completely separate (so they can life side-by side). The changes to MMapDircetory are mainly a change from int to long for the chunk size (this is why I also not yet removed the methods to query unmapping status, thy would always return true for JDK>xxxx.
   
   When we would have this productive, MMapDirectory would just return one of the IndexInputs that fits your JDK. We may only need to move the code that maps a file to ByteBuffer or MemorySegment to a separate class or maybe the IndexInput class. Then we have 2 classes staying next to each other, one for JDK 11 and one for JDK 16/17/18+, MMapDircetory just chooses the correct one. Then we won't need a MR JAR.
   
   With compilation we can use `--release` and of course a newer Gradle version. Like Elasticsearch, people who want to compile Lucene/Solr would need JDK 17, but the code is working for JDK 11+. Elasticsearch does exactly the same: It is tested and compiled for Java 8, but to build it you need JDK 15.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] msokolov commented on pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

msokolov commented on pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#issuecomment-754299548


   I ran benchmarks one more time, including vector tasks:
   
   ```
                       TaskQPS baseline      StdDevQPS candidate      StdDev                Pct diff p-value
      BrowseMonthTaxoFacets        2.17      (6.3%)        1.17      (1.6%)  -46.1% ( -50% -  -40%) 0.000
       BrowseDateTaxoFacets        2.06      (6.5%)        1.14      (1.7%)  -44.7% ( -49% -  -39%) 0.000
   BrowseDayOfYearTaxoFacets        2.06      (6.5%)        1.14      (1.7%)  -44.7% ( -49% -  -38%) 0.000
                     Fuzzy2       50.14     (10.0%)       42.75      (9.0%)  -14.7% ( -30% -    4%) 0.000
               OrNotHighLow      471.70      (4.3%)      404.83      (3.7%)  -14.2% ( -21% -   -6%) 0.000
                    Respell       48.12      (1.8%)       41.51      (1.2%)  -13.7% ( -16% -  -11%) 0.000
      BrowseMonthSSDVFacets       12.43      (7.9%)       10.88      (6.6%)  -12.4% ( -24% -    2%) 0.000
                   PKLookup      134.94      (1.0%)      118.56      (1.1%)  -12.1% ( -14% -  -10%) 0.000
                    LowTerm      972.51      (3.9%)      870.89      (2.9%)  -10.4% ( -16% -   -3%) 0.000
                 AndHighLow     1256.92      (4.3%)     1126.63      (4.2%)  -10.4% ( -18% -   -1%) 0.000
                   HighTerm      938.27      (4.5%)      843.52      (3.7%)  -10.1% ( -17% -   -1%) 0.000
              LowTermVector      821.50      (2.4%)      739.43      (1.4%)  -10.0% ( -13% -   -6%) 0.000
                   Wildcard       94.86      (2.1%)       85.51      (1.9%)   -9.9% ( -13% -   -6%) 0.000
                  OrHighMed       98.33      (3.0%)       88.70      (3.0%)   -9.8% ( -15% -   -3%) 0.000
           AndHighLowVector      859.32      (2.2%)      775.74      (1.9%)   -9.7% ( -13% -   -5%) 0.000
           AndHighMedVector      723.85      (4.0%)      654.44      (3.3%)   -9.6% ( -16% -   -2%) 0.000
          AndHighHighVector      825.28      (1.7%)      746.88      (1.5%)   -9.5% ( -12% -   -6%) 0.000
                    MedTerm     1133.18      (4.2%)     1025.84      (2.6%)   -9.5% ( -15% -   -2%) 0.000
                 AndHighMed      163.82      (4.0%)      148.32      (3.3%)   -9.5% ( -16% -   -2%) 0.000
              OrNotHighHigh      432.50      (4.6%)      391.92      (3.9%)   -9.4% ( -17% -    0%) 0.000
              MedTermVector      761.28      (3.9%)      690.36      (3.4%)   -9.3% ( -15% -   -2%) 0.000
            MedSloppyPhrase       44.77      (2.3%)       40.61      (1.6%)   -9.3% ( -12% -   -5%) 0.000
              OrHighNotHigh      457.14      (3.9%)      414.97      (5.6%)   -9.2% ( -17% -    0%) 0.000
             HighTermVector      814.15      (5.0%)      741.66      (5.0%)   -8.9% ( -18% -    1%) 0.000
                AndHighHigh       34.20      (3.6%)       31.39      (3.5%)   -8.2% ( -14% -   -1%) 0.000
                  OrHighLow      318.11      (7.2%)      292.68      (4.9%)   -8.0% ( -18% -    4%) 0.000
               OrNotHighMed      489.33      (4.3%)      451.73      (3.3%)   -7.7% ( -14% -    0%) 0.000
                  MedPhrase      112.51      (2.1%)      103.95      (2.3%)   -7.6% ( -11% -   -3%) 0.000
               HighSpanNear        9.46      (3.1%)        8.76      (2.1%)   -7.4% ( -12% -   -2%) 0.000
            LowSloppyPhrase       98.59      (2.4%)       91.33      (1.3%)   -7.4% ( -10% -   -3%) 0.000
                MedSpanNear       98.82      (2.6%)       91.64      (2.0%)   -7.3% ( -11% -   -2%) 0.000
                     Fuzzy1       52.63      (9.9%)       49.02      (5.8%)   -6.9% ( -20% -    9%) 0.007
                LowSpanNear       93.95      (2.1%)       87.52      (1.6%)   -6.8% ( -10% -   -3%) 0.000
                 HighPhrase      256.37      (3.7%)      239.54      (2.3%)   -6.6% ( -12% -    0%) 0.000
                    Prefix3      227.65      (6.2%)      213.30      (5.9%)   -6.3% ( -17% -    6%) 0.001
           HighSloppyPhrase        9.96      (3.1%)        9.35      (2.2%)   -6.1% ( -11% -    0%) 0.000
                 OrHighHigh       21.71      (2.6%)       20.54      (1.8%)   -5.4% (  -9% -   -1%) 0.000
               OrHighNotMed      471.56      (6.3%)      446.92      (4.6%)   -5.2% ( -15% -    5%) 0.003
               OrHighNotLow      555.62      (4.9%)      526.59      (3.5%)   -5.2% ( -13% -    3%) 0.000
      HighTermDayOfYearSort       33.02      (9.4%)       31.30      (8.0%)   -5.2% ( -20% -   13%) 0.058
                  LowPhrase      310.16      (4.3%)      295.71      (2.5%)   -4.7% ( -11% -    2%) 0.000
       HighIntervalsOrdered       12.32      (1.4%)       11.77      (1.3%)   -4.5% (  -7% -   -1%) 0.000
          HighTermMonthSort      121.01     (14.0%)      116.57     (11.1%)   -3.7% ( -25% -   24%) 0.358
   BrowseDayOfYearSSDVFacets        9.72      (6.0%)        9.39      (1.7%)   -3.3% ( -10% -    4%) 0.017
                 TermDTSort       96.25      (8.7%)       94.99      (8.3%)   -1.3% ( -16% -   17%) 0.626
       HighTermTitleBDVSort      102.37     (16.1%)      101.51     (16.7%)   -0.8% ( -28% -   38%) 0.871
                     IntNRQ      203.00      (0.8%)      202.97      (0.7%)   -0.0% (  -1% -    1%) 0.952
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] uschindler commented on pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

uschindler commented on pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#issuecomment-755818100


   Hi @msokolov: If you have so time, you may check performance again. I rewrote the getBytes() and getLEXxxx() methods a bit.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] dweiss commented on a change in pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

dweiss commented on a change in pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#discussion_r551179303



##########
File path: lucene/core/src/java/org/apache/lucene/store/MemorySegmentIndexInput.java
##########
@@ -0,0 +1,525 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.store;
+
+import java.io.EOFException;
+import java.io.IOException;
+import java.lang.invoke.VarHandle;
+import java.nio.ByteOrder;
+import java.util.Arrays;
+import java.util.Objects;
+
+import org.apache.lucene.util.IOUtils;
+
+import jdk.incubator.foreign.MemoryHandles;
+import jdk.incubator.foreign.MemorySegment;
+
+/**
+ * Base IndexInput implementation that uses an array of MemorySegments to represent a file.
+ *
+ * <p>For efficiency, this class requires that the segment size are a power-of-two
+ * (<code>chunkSizePower</code>).
+ */
+public abstract class MemorySegmentIndexInput extends IndexInput implements RandomAccessInput {
+  // We pass 1L as alignment, because currently Lucene file formats are heavy unaligned: :(
+  static final VarHandle VH_getByte = MemoryHandles.varHandle(byte.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();
+  static final VarHandle VH_getShort = MemoryHandles.varHandle(short.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();
+  static final VarHandle VH_getInt = MemoryHandles.varHandle(int.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();
+  static final VarHandle VH_getLong = MemoryHandles.varHandle(long.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();
+  
+  final boolean isClone;
+  final long length;
+  final long chunkSizeMask;
+  final int chunkSizePower;
+  final MemorySegment[] segments;
+  
+  int curSegmentIndex = -1;
+  MemorySegment curSegment; // redundant for speed: segments[curSegmentIndex], also marker if closed!
+  long curPosition; // relative to curSegment, not globally
+
+  public static MemorySegmentIndexInput newInstance(
+      String resourceDescription,
+      MemorySegment[] segments,
+      long length,
+      int chunkSizePower) {
+    if (segments.length == 1) {
+      return new SingleSegmentImpl(resourceDescription, segments[0], length, chunkSizePower, false);
+    } else {
+      return new MultiSegmentImpl(resourceDescription, segments, 0, length, chunkSizePower, false);
+    }
+  }
+
+  private MemorySegmentIndexInput(
+      String resourceDescription,
+      MemorySegment[] segments,
+      long length,
+      int chunkSizePower,
+      boolean isClone) {
+    super(resourceDescription);
+    this.segments = segments;
+    this.length = length;
+    this.chunkSizePower = chunkSizePower;
+    this.chunkSizeMask = (1L << chunkSizePower) - 1L;
+    this.isClone = isClone;
+    this.curSegment = segments[0];
+  }
+  
+  void ensureOpen() {
+    if (curSegment == null) {
+      throw alreadyClosed();
+    }
+  }
+
+  RuntimeException wrapAlreadyClosedException(RuntimeException e) {
+    if (e instanceof NullPointerException) {
+      return alreadyClosed();
+    }
+    // TODO: maybe open a JDK issue to have a separate, more
+    // meaningful exception for unmapped segments:
+    if (e.getMessage() != null && e.getMessage().contains("closed")) {
+      return alreadyClosed();
+    }
+    return e;
+  }
+  
+  // return value only for compiler!
+  byte handlePositionalIOOBE(String action, long pos) throws IOException {

Review comment:
       The trick with byte return could be solved in a different way as well - by declaring a (dummy) return type of RuntimeException and replacing {{return handlePositionalIOOBE()}} with {{throw handlePositionalIOOBE(...)}} everywhere else. This has an additional benefit that code paths at invocation points see the code block as throwing an exception (so anything that follows is unreachable).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] uschindler commented on pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

uschindler commented on pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#issuecomment-753693042


   I added test runs on Jenkins: https://jenkins.thetaphi.de/job/Lucene-Solr-jdk16panama-Linux/ and https://jenkins.thetaphi.de/job/Lucene-Solr-jdk16panama-Windows/


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] uschindler edited a comment on pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

uschindler edited a comment on pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#issuecomment-753511728


   @dweiss I found a workaround by adding the include/excludes as we use in Ant build. Why were those removed? Should I maybe reopen a new issue to add them back in master?
   
   I have the feeling thois comes from the fact that class files compiled with JDK16 and using the new class file format can't be "analyzed" by gradle, so it assumes "no idea, let's run it because its a class file". The automatisms don't seem to work then. With the include/exclude patterns everything went back to normal.
   
   Should we also add those lines to `defaults-tests.gradle` in master branch? I don't trust those autodetection, especially it may load read class files that it does not need to look at all?:
   ```
         include '**/*Test.class', '**/Test*.class'
         exclude '**/*$*'
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] jpountz commented on pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

jpountz commented on pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#issuecomment-753490290


   > In future we should throw away on coping/loading our stuff to heap and maybe throw away IndexInput completely and base our code fully on random access. The new foreign-vector APIs will in future also be written with MemorySegment in its focus. So you can allocate a vector view on a MemorySegment and let the vectorizer fully work outside java heap inside our mmapped files! :-)
   
   +1 Things work like they do today because auto-vectorization only works on on-heap arrays, and this readLELongs method was the fastest way to copy data from a directory to a long[]. @msokolov and I were just discussing a few days ago how this vector API might help drop the copy entirely in the future, which would be great.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] uschindler closed pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) Panama APIs (>= JDK-16-ea-b32)

Posted by GitBox <gi...@apache.org>.

uschindler closed pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] uschindler commented on pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

uschindler commented on pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#issuecomment-754299385

Hi @msokolov,
thanks for the speed test - I wanted to wait for the specialists to do it. As said before, we should report slowness (or improvements) to the Hotpsot/OpenJDK people. I was expecting a result like this due to the early stage of the development and misisng Hotspot optimizations. Just to clarify: In your code, you are reading the floats with a series of readInt() from the indexinput, converting them to float?
If that's the case, I would say: this is expected! From what I figured out on the mailing lists: loops using `long` as an index vs. ìnt` as an index are not supported well by Hotspot yet. The position() argument of ByteBuffers is a 32bits `int`and whenever you read from memory it uses this int (`index += 4`). Those loops are easy optimized. With `MemorySegment`s, the pointer is a 64bit `long` named `curPosition` in our `MemorySegmentIndexInput`. Hotspot has no idea how to optimize that loop of readInt() statements! I am not sure if this is fully true at this stage but that's my first guess, I will discuss this with Maurizio.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] uschindler commented on a change in pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

uschindler commented on a change in pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#discussion_r552031972



##########
File path: lucene/core/src/java/org/apache/lucene/store/MemorySegmentIndexInput.java
##########
@@ -0,0 +1,524 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.store;
+
+import java.io.EOFException;
+import java.io.IOException;
+import java.lang.invoke.VarHandle;
+import java.nio.ByteOrder;
+import java.util.Arrays;
+import java.util.Objects;
+
+import org.apache.lucene.util.IOUtils;
+
+import jdk.incubator.foreign.MemoryHandles;
+import jdk.incubator.foreign.MemorySegment;
+
+/**
+ * Base IndexInput implementation that uses an array of MemorySegments to represent a file.
+ *
+ * <p>For efficiency, this class requires that the segment size are a power-of-two
+ * (<code>chunkSizePower</code>).
+ */
+public abstract class MemorySegmentIndexInput extends IndexInput implements RandomAccessInput {
+  // We pass 1L as alignment, because currently Lucene file formats are heavy unaligned: :(
+  static final VarHandle VH_getByte = MemoryHandles.varHandle(byte.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();
+  static final VarHandle VH_getShort = MemoryHandles.varHandle(short.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();

Review comment:
       Thanks for review! In fact that's wanted, because Lucene Apis read any type always from a file offset. The alignment of 1L is wanted because currently we have ints/longs/floats anywhere in files (see PR description).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] mcimadamore commented on a change in pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

mcimadamore commented on a change in pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#discussion_r552026775



##########
File path: lucene/core/src/java/org/apache/lucene/store/MemorySegmentIndexInput.java
##########
@@ -0,0 +1,524 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.store;
+
+import java.io.EOFException;
+import java.io.IOException;
+import java.lang.invoke.VarHandle;
+import java.nio.ByteOrder;
+import java.util.Arrays;
+import java.util.Objects;
+
+import org.apache.lucene.util.IOUtils;
+
+import jdk.incubator.foreign.MemoryHandles;
+import jdk.incubator.foreign.MemorySegment;
+
+/**
+ * Base IndexInput implementation that uses an array of MemorySegments to represent a file.
+ *
+ * <p>For efficiency, this class requires that the segment size are a power-of-two
+ * (<code>chunkSizePower</code>).
+ */
+public abstract class MemorySegmentIndexInput extends IndexInput implements RandomAccessInput {
+  // We pass 1L as alignment, because currently Lucene file formats are heavy unaligned: :(
+  static final VarHandle VH_getByte = MemoryHandles.varHandle(byte.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();
+  static final VarHandle VH_getShort = MemoryHandles.varHandle(short.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();

Review comment:
       Just to be sure here - the offset accepted by this var handle is a _byte_ offset not a short offset. E.g. if you want to read consecutive short elements you need to do:
   ```
   VH_getShort(segment, 0);
   VH_getShort(segment, 2);
   VH_getShort(segment, 4);
   ```
   Alternatively, if you need array-like indexing, but don't want to mess with layouts, you can use the pre-baked statics inside the MemoryAccess class (e.g. `getShortAtIndex` vs. `getShortAtOffset` - the former take a logical short index, the latter takes a raw byte offset).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] uschindler edited a comment on pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

uschindler edited a comment on pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#issuecomment-753511728


   @dweiss I found a workaround by adding the include/excludes as we use in Ant build. Why were those removed? Should I maybe reopen a new issue.
   
   I have the feeling thois comes from the fact that class files compiled with JDK16 and using the new class file format can't be "analyzed" by gradle, so it assumes "no idea, let's run it because its a class file". The automatisms don't seem to work then. With the include/exclude patterns everything went back to normal.
   
   Should we also add those lines to `defaults-tests.gradle` in master branch? I don't trust those autodetection, especially it may load read class files that it does not need to look at all?:
   ```
         include '**/*Test.class', '**/Test*.class'
         exclude '**/*$*'
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] msokolov commented on pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

msokolov commented on pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#issuecomment-758175143


   @uschindler I pulled latest from this branch (ba61072221905c202450034c0c90409f7073ebb3) and re-ran (comparing to updated master (6711eb7571727552aad3ace53c52c9a8fe07dc40) and see similar results:
   
   ```
                       TaskQPS baseline      StdDevQPS candidate      StdDev                Pct diff p-value
      BrowseMonthTaxoFacets        2.15      (7.0%)        1.20      (5.6%)  -44.2% ( -53% -  -33%) 0.000
   BrowseDayOfYearTaxoFacets        2.04      (7.1%)        1.16      (5.8%)  -42.9% ( -52% -  -32%) 0.000
       BrowseDateTaxoFacets        2.04      (7.1%)        1.17      (5.8%)  -42.8% ( -51% -  -32%) 0.000
                    LowTerm     1099.09      (4.8%)      936.16      (3.2%)  -14.8% ( -21% -   -7%) 0.000
               OrNotHighLow      523.44      (7.0%)      450.50      (3.8%)  -13.9% ( -23% -   -3%) 0.000
                    Respell       42.23      (2.2%)       36.71      (2.5%)  -13.1% ( -17% -   -8%) 0.000
           AndHighMedVector      893.77      (2.9%)      786.53      (3.0%)  -12.0% ( -17% -   -6%) 0.000
                 AndHighLow      662.65      (2.9%)      584.68      (2.0%)  -11.8% ( -16% -   -7%) 0.000
                   PKLookup      136.02      (1.5%)      120.03      (1.5%)  -11.8% ( -14% -   -8%) 0.000
                     Fuzzy1       53.41      (6.4%)       47.18      (5.4%)  -11.7% ( -22% -    0%) 0.000
              MedTermVector     1067.80      (3.2%)      946.10      (2.3%)  -11.4% ( -16% -   -6%) 0.000
           AndHighLowVector      941.02      (3.8%)      834.30      (3.1%)  -11.3% ( -17% -   -4%) 0.000
          AndHighHighVector      908.27      (2.9%)      808.05      (2.1%)  -11.0% ( -15% -   -6%) 0.000
              LowTermVector     1010.16      (2.2%)      899.95      (1.9%)  -10.9% ( -14% -   -6%) 0.000
                    MedTerm     1167.78      (3.9%)     1043.79      (3.1%)  -10.6% ( -17% -   -3%) 0.000
      BrowseMonthSSDVFacets       11.99     (13.4%)       10.73      (7.4%)  -10.5% ( -27% -   11%) 0.002
                 AndHighMed      155.89      (3.5%)      139.90      (2.8%)  -10.3% ( -16% -   -4%) 0.000
               OrNotHighMed      464.05      (4.1%)      416.91      (4.1%)  -10.2% ( -17% -   -2%) 0.000
             HighTermVector     1222.14      (3.5%)     1105.22      (2.2%)   -9.6% ( -14% -   -4%) 0.000
                   Wildcard       91.66      (2.2%)       83.55      (2.4%)   -8.9% ( -13% -   -4%) 0.000
            LowSloppyPhrase       46.92      (2.1%)       42.77      (1.5%)   -8.8% ( -12% -   -5%) 0.000
                LowSpanNear       87.56      (2.7%)       80.02      (2.0%)   -8.6% ( -13% -   -3%) 0.000
               OrHighNotMed      470.32      (5.4%)      429.87      (3.3%)   -8.6% ( -16% -    0%) 0.000
                   HighTerm     1269.54      (7.1%)     1163.40      (5.8%)   -8.4% ( -19% -    4%) 0.000
              OrHighNotHigh      430.03      (4.1%)      395.13      (4.3%)   -8.1% ( -15% -    0%) 0.000
               OrHighNotLow      585.18      (5.0%)      537.96      (5.5%)   -8.1% ( -17% -    2%) 0.000
                AndHighHigh       37.46      (3.2%)       34.45      (2.6%)   -8.0% ( -13% -   -2%) 0.000
                  OrHighLow      576.99      (7.4%)      533.08      (6.4%)   -7.6% ( -19% -    6%) 0.001
                 HighPhrase      261.45      (2.1%)      241.99      (1.8%)   -7.4% ( -11% -   -3%) 0.000
                  LowPhrase      375.57      (3.3%)      347.95      (3.7%)   -7.4% ( -13% -    0%) 0.000
               HighSpanNear       20.85      (2.8%)       19.36      (2.0%)   -7.2% ( -11% -   -2%) 0.000
                  MedPhrase       22.28      (1.8%)       20.77      (1.2%)   -6.8% (  -9% -   -3%) 0.000
            MedSloppyPhrase       15.96      (3.0%)       14.92      (2.5%)   -6.5% ( -11% -    0%) 0.000
                    Prefix3      204.87      (2.0%)      193.03      (2.2%)   -5.8% (  -9% -   -1%) 0.000
              OrNotHighHigh      496.09      (6.6%)      467.47      (4.1%)   -5.8% ( -15% -    5%) 0.001
           HighSloppyPhrase       22.22      (4.0%)       21.02      (3.0%)   -5.4% ( -11% -    1%) 0.000
          HighTermMonthSort      132.66     (11.4%)      126.18     (13.2%)   -4.9% ( -26% -   22%) 0.211
                 TermDTSort       68.55     (13.3%)       65.22     (15.1%)   -4.9% ( -29% -   27%) 0.279
                  OrHighMed       78.75      (3.8%)       74.95      (2.6%)   -4.8% ( -10% -    1%) 0.000
                MedSpanNear       29.19      (2.3%)       27.83      (1.9%)   -4.7% (  -8% -    0%) 0.000
      HighTermDayOfYearSort      105.52     (12.0%)      101.55     (10.1%)   -3.8% ( -23% -   20%) 0.284
                     IntNRQ       76.00     (12.3%)       73.29     (11.9%)   -3.6% ( -24% -   23%) 0.351
       HighIntervalsOrdered       21.82      (1.5%)       21.05      (1.5%)   -3.5% (  -6% -    0%) 0.000
       HighTermTitleBDVSort      114.39     (16.4%)      110.77     (14.9%)   -3.2% ( -29% -   33%) 0.522
                 OrHighHigh       16.17      (3.5%)       15.68      (2.8%)   -3.1% (  -9% -    3%) 0.002
   BrowseDayOfYearSSDVFacets        9.57      (9.0%)        9.29      (5.2%)   -2.9% ( -15% -   12%) 0.221
                     Fuzzy2       30.01      (8.5%)       30.26     (10.2%)    0.8% ( -16% -   21%) 0.781
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] uschindler edited a comment on pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

uschindler edited a comment on pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#issuecomment-753503210


   @dweiss I need your help here. There is one thing that drives me crazy: with the changes (actually adding `--add-modules jdk.incubator.foreign` to the test command line), JUnit 4 suddely fails for many "non-tests" with exceptions like this, although those should never have been executed. It's mostly some test helper classes (like the writer for Lucene60Points).
   
   This seems to have to do with some automatism by Gradle or by JUnit once it detects the "Java module system", and I have no idea how to turn it off! What I am completely wondering about: why does it load those classes at all? I was trying to find the good old "Ant fileset" that has this `**/*Test.class,**/Test*.class` pattern, but this is no longer there with gradle. How to re-add it, if JUnit/Gradle seems to think that it needs some (broken) module system API.
   
   Unless this is fixed, I can't run all tests easily. I only ran them per module and ignored the tons of stack traces. PLEASE HELP!
   
   ```
   org.apache.lucene.backward_codecs.lucene60.Lucene60PointsWriter > initializationError FAILED
       java.lang.IllegalArgumentException: Test class can only have one constructor
           at org.junit.runners.model.TestClass.<init>(TestClass.java:48)
           at org.junit.runners.JUnit4.<init>(JUnit4.java:23)
           at org.junit.internal.builders.JUnit4Builder.runnerForClass(JUnit4Builder.java:10)
           at org.junit.runners.model.RunnerBuilder.safeRunnerForClass(RunnerBuilder.java:70)
           at org.junit.internal.builders.AllDefaultPossibilitiesBuilder.runnerForClass(AllDefaultPossibilitiesBuilder.java:37)
           at org.junit.runners.model.RunnerBuilder.safeRunnerForClass(RunnerBuilder.java:70)
           at org.junit.internal.requests.ClassRequest.createRunner(ClassRequest.java:28)
           at org.junit.internal.requests.MemoizingRequest.getRunner(MemoizingRequest.java:19)
           at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:78)
           at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
           at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
           at org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
           at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
           at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:78)
           at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.base/java.lang.reflect.Method.invoke(Method.java:567)
           at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36)
           at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
           at org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:33)
           at org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:94)
           at jdk.proxy1/jdk.proxy1.$Proxy2.processTestClass(Unknown Source)
           at org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:119)
           at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:78)
           at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.base/java.lang.reflect.Method.invoke(Method.java:567)
           at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36)
           at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
           at org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:182)
           at org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:164)
           at org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:414)
           at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:64)
           at org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:48)
           at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
           at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
           at org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:56)
           at java.base/java.lang.Thread.run(Thread.java:831)
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] uschindler commented on a change in pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

Posted by GitBox <gi...@apache.org>.

uschindler commented on a change in pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#discussion_r551202560



##########
File path: lucene/core/src/java/org/apache/lucene/store/MemorySegmentIndexInput.java
##########
@@ -0,0 +1,525 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.store;
+
+import java.io.EOFException;
+import java.io.IOException;
+import java.lang.invoke.VarHandle;
+import java.nio.ByteOrder;
+import java.util.Arrays;
+import java.util.Objects;
+
+import org.apache.lucene.util.IOUtils;
+
+import jdk.incubator.foreign.MemoryHandles;
+import jdk.incubator.foreign.MemorySegment;
+
+/**
+ * Base IndexInput implementation that uses an array of MemorySegments to represent a file.
+ *
+ * <p>For efficiency, this class requires that the segment size are a power-of-two
+ * (<code>chunkSizePower</code>).
+ */
+public abstract class MemorySegmentIndexInput extends IndexInput implements RandomAccessInput {
+  // We pass 1L as alignment, because currently Lucene file formats are heavy unaligned: :(
+  static final VarHandle VH_getByte = MemoryHandles.varHandle(byte.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();
+  static final VarHandle VH_getShort = MemoryHandles.varHandle(short.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();
+  static final VarHandle VH_getInt = MemoryHandles.varHandle(int.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();
+  static final VarHandle VH_getLong = MemoryHandles.varHandle(long.class, 1L, ByteOrder.BIG_ENDIAN).withInvokeExactBehavior();
+  
+  final boolean isClone;
+  final long length;
+  final long chunkSizeMask;
+  final int chunkSizePower;
+  final MemorySegment[] segments;
+  
+  int curSegmentIndex = -1;
+  MemorySegment curSegment; // redundant for speed: segments[curSegmentIndex], also marker if closed!
+  long curPosition; // relative to curSegment, not globally
+
+  public static MemorySegmentIndexInput newInstance(
+      String resourceDescription,
+      MemorySegment[] segments,
+      long length,
+      int chunkSizePower) {
+    if (segments.length == 1) {
+      return new SingleSegmentImpl(resourceDescription, segments[0], length, chunkSizePower, false);
+    } else {
+      return new MultiSegmentImpl(resourceDescription, segments, 0, length, chunkSizePower, false);
+    }
+  }
+
+  private MemorySegmentIndexInput(
+      String resourceDescription,
+      MemorySegment[] segments,
+      long length,
+      int chunkSizePower,
+      boolean isClone) {
+    super(resourceDescription);
+    this.segments = segments;
+    this.length = length;
+    this.chunkSizePower = chunkSizePower;
+    this.chunkSizeMask = (1L << chunkSizePower) - 1L;
+    this.isClone = isClone;
+    this.curSegment = segments[0];
+  }
+  
+  void ensureOpen() {
+    if (curSegment == null) {
+      throw alreadyClosed();
+    }
+  }
+
+  RuntimeException wrapAlreadyClosedException(RuntimeException e) {
+    if (e instanceof NullPointerException) {
+      return alreadyClosed();
+    }
+    // TODO: maybe open a JDK issue to have a separate, more
+    // meaningful exception for unmapped segments:
+    if (e.getMessage() != null && e.getMessage().contains("closed")) {
+      return alreadyClosed();
+    }
+    return e;
+  }
+  
+  // return value only for compiler!
+  byte handlePositionalIOOBE(String action, long pos) throws IOException {

Review comment:
       Committed and pushed this change.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org