You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/11/10 17:03:37 UTC

[GitHub] [lucene] jpountz opened a new pull request, #11917: Automatically preload index files that are both tiny and very hot.

jpountz opened a new pull request, #11917:
URL: https://github.com/apache/lucene/pull/11917

   The default codec has a number of small and hot files, that actually used to be fully loaded in memory before we moved them off-heap. In the general case, these files are expected to fully fit into the page cache for things to work well. Should we give control over preloading to codecs? This is what this commit does for the following files:
    - Terms index (`tip`)
    - Points index (`kdi`)
    - Stored fields index (`fdx`)
    - Terms vector index (`tvx`)
   
   This only has an effect on `MMapDirectory`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] uschindler commented on a diff in pull request #11917: Automatically preload index files that are both tiny and very hot.

Posted by GitBox <gi...@apache.org>.

uschindler commented on code in PR #11917:
URL: https://github.com/apache/lucene/pull/11917#discussion_r1021839467


##########
lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java:
##########
@@ -93,15 +93,8 @@ public class MMapDirectory extends FSDirectory {
    */
   public static final BiPredicate<String, IOContext> NO_FILES = (filename, context) -> false;
 
-  /**
-   * Argument for {@link #setPreload(BiPredicate)} that configures all files that use the {@link
-   * IOContext#LOAD} to be preloaded upon opening them. This is the default.
-   */
-  public static final BiPredicate<String, IOContext> BASED_ON_LOAD_IO_CONTEXT =

Review Comment:
   We can still provide this option.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] rmuir commented on pull request #11917: Automatically preload index files that are both tiny and very hot.

Posted by GitBox <gi...@apache.org>.

rmuir commented on PR #11917:
URL: https://github.com/apache/lucene/pull/11917#issuecomment-1314072987

   And 1.75GB is the same amount if you want to mlock the pages. Which no operating system is gonna let you do by default, on linux it will require modifying system soft/hard limits.
   
   If a user has 1TB worth of index data, is it more intuitive that they should run with e.g. at least a 4GB heap? Or that they should go mucking around in /etc/security?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] uschindler commented on pull request #11917: Automatically preload index files that are both tiny and very hot.

Posted by GitBox <gi...@apache.org>.

uschindler commented on PR #11917:
URL: https://github.com/apache/lucene/pull/11917#issuecomment-1313886416

   I think the predicate would also simplify the current Elasticserach code which instantiates 2 MMapDirectory instances with the same underlying dir and uses FileSwitchDir in front. A predicate is much better to handle this ("which files to preload").


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] uschindler commented on pull request #11917: Automatically preload index files that are both tiny and very hot.

Posted by GitBox <gi...@apache.org>.

uschindler commented on PR #11917:
URL: https://github.com/apache/lucene/pull/11917#issuecomment-1313875966

   > @uschindler I think I understand Robert's concerns against mlock, but less yours against loading. Do you think that it would be useless to preload if these files can then get paged out?
   
   I have not measured this, but i would say it is better to have a very short delay on the first access to the index and not preload it. If the preloading is so fast and - as you say - would not cause a delay on opening the index, why would it then cause a significant delay on first access?
   
   > My thinking was that most users would rather like Lucene to take a bit longer to open indices if it helps provide better performance on initial queries, noting that it wouldn't make opening much slower given how tiny these files tend to be compared to the overall dataset (~0.17% of the total index size for the index generated by nightly benchmarks). In general these files would stay in the page cache all the time, but if they don't because of unusual access patterns or competing processes and the page cache decides to page out some data, it wouldn't be much worse than today. I do think some users will see appeal in the mlock option and getting stronger guarantees that this data wouldn't get paged out, but this feels like something that should be an opt-in rather than the default?
   
   That was my plan, allow to enable mlock2 optionally (optionally for 2 reasons: (1) it needs a command line parameter to enable unsafe memory accesses; (2) for the reasons you said before and Robert's issue). I was just waiting for the correct IOContext to control this. If you agree that all files marked with IOContext.LOAD are also candidates for mlock2, then I am fine.
   
   Instead of enabling preload automatically, I tend to make the setPreload setting a tristate or a `BiPredicate<String/*filename*/,IOContext>`. So users can easly preload files they like they did on `mmapdir.setPreload(true)` would be replaced by `setPreload(ALL_FILES)` or like that, `setPreload(false)` would get `setPreload(NO_FILES)` and your current variant something like `setPreload((name, context) -> (context == IOContext.LOAD))`. 
   
   What do you think of making preload better controlable (also for elasticsearch), if we replace the boolean by a `BiPredicate`?
   
   For mlock2 I would add, to allow the same: `setLockPages(BiPredicate<String/*filename*/,IOContext>)`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] uschindler commented on pull request #11917: Automatically preload index files that are both tiny and very hot.

Posted by GitBox <gi...@apache.org>.

uschindler commented on PR #11917:
URL: https://github.com/apache/lucene/pull/11917#issuecomment-1313914337

   Looks fine. I would add 3 predefined constants:
   
   ```java
   public static final BiPredicate<String,Context> ALL_FILES = (file, context) -> true;
   public static final BiPredicate<String,Context> NO_FILES = (file, context) -> false;
   public static final BiPredicate<String,Context> USE_LOAD_CONTEXT = (file, context) -> context.load;
   ```
   
   Then you could also keep the deprecated getter by compaing with ALL_FILES; The Default would be the new one.
   
   What's missing is a little bit of better documentation in `setPreload()` and possibly the class javadocs. It is unknown what the first param "String" of the predicate is, it should document that it is the filename.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] uschindler commented on pull request #11917: Automatically preload index files that are both tiny and very hot.

Posted by GitBox <gi...@apache.org>.

uschindler commented on PR #11917:
URL: https://github.com/apache/lucene/pull/11917#issuecomment-1313922578

   Unrelated problem with Github CI:
   
   > org.apache.lucene.index.TestIndexFileDeleter.testExcInDecRef
   
   this test failed for me yesterday in the same way. Seems to be a bug.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] uschindler commented on pull request #11917: Automatically preload index files that are both tiny and very hot.

Posted by GitBox <gi...@apache.org>.

uschindler commented on PR #11917:
URL: https://github.com/apache/lucene/pull/11917#issuecomment-1314119766

   Maybe we should start again here:
   - sorry for mentioning mlock2, I will think of a way in the future to hook into the MemorySegment and configure it in native ways (e.g. set madvise) on MMapDirectory. I won't implement a methodhandle to mlock2 without haveing the fun of fighting with @rmuir ;-]
   - Maybe split the PR into two parts: (1) adding the improved preload BiPredicate (this would allow Elasticsearch to do whatever they want) and (2) add the IOContext support for those files and also add the example as proposed here.
   
   I would not change the default. That's up to the end-user of Lucene (e.g. Elasticsearch, Solr, Opensearch).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] uschindler commented on pull request #11917: Automatically preload index files that are both tiny and very hot.

Posted by GitBox <gi...@apache.org>.

uschindler commented on PR #11917:
URL: https://github.com/apache/lucene/pull/11917#issuecomment-1310976902

   > I think preload is different from mlock, mlock needs way more discussion and personally I'm against it. mlock would be an operational hassle because of default resource limits on linux as well.
   
   This would of course be optional (it must be optional as it also requires extra JVM command line params). I think we should give users the option to do it.
   
   Personally I also tend to be against "load".


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] jpountz commented on pull request #11917: Automatically preload index files that are both tiny and very hot.

Posted by GitBox <gi...@apache.org>.

jpountz commented on PR #11917:
URL: https://github.com/apache/lucene/pull/11917#issuecomment-1313881947

   > If you agree that all files marked with IOContext.LOAD are also candidates for mlock2, then I am fine.
   
   I think it would apply to the same files indeed.
   
   Let me update the PR with your suggested changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] jpountz commented on pull request #11917: Automatically preload index files that are both tiny and very hot.

Posted by GitBox <gi...@apache.org>.

jpountz commented on PR #11917:
URL: https://github.com/apache/lucene/pull/11917#issuecomment-1314178816

   This got split into #11929 and #11930.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] uschindler commented on a diff in pull request #11917: Automatically preload index files that are both tiny and very hot.

Posted by GitBox <gi...@apache.org>.

uschindler commented on code in PR #11917:
URL: https://github.com/apache/lucene/pull/11917#discussion_r1021849653


##########
lucene/core/src/java/org/apache/lucene/store/IOContext.java:
##########
@@ -45,8 +45,8 @@ public enum Context {
   public final boolean readOnce;
 
   /**
-   * This flag indicates that the file is expected to be heavily accessed in a random-access fashion
-   * and should have its content in memory.
+   * This flag indicates that the file is expected to be heavily accessed in a random-access

Review Comment:
   the name of the field is not really applicable for that description. Maybe it should be called different? I dont know a good name, e.g. "tinyRandomAccess"



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] uschindler commented on a diff in pull request #11917: Automatically preload index files that are both tiny and very hot.

Posted by GitBox <gi...@apache.org>.

uschindler commented on code in PR #11917:
URL: https://github.com/apache/lucene/pull/11917#discussion_r1020403519


##########
lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java:
##########
@@ -235,7 +235,7 @@ public IndexInput openInput(String name, IOContext context) throws IOException {
     ensureOpen();
     ensureCanRead(name);
     Path path = directory.resolve(name);
-    return PROVIDER.openInput(path, context, chunkSizePower, preload, useUnmapHack);
+    return PROVIDER.openInput(path, context, chunkSizePower, preload || context.load, useUnmapHack);

Review Comment:
   No, the context is passed already and the provider class could decide how to handle the IO context.
   
   But leave it as is for now. If we want two have different behaviour of MemeorySegments and MappedByteBuffer in future, I can refactor it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] jpountz commented on pull request #11917: Automatically preload index files that are both tiny and very hot.

Posted by GitBox <gi...@apache.org>.

jpountz commented on PR #11917:
URL: https://github.com/apache/lucene/pull/11917#issuecomment-1314152039

   Agreed with your suggestion to reset and split into two PRs @uschindler. I'll close this one and open two new ones.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] uschindler commented on pull request #11917: Automatically preload index files that are both tiny and very hot.

Posted by GitBox <gi...@apache.org>.

uschindler commented on PR #11917:
URL: https://github.com/apache/lucene/pull/11917#issuecomment-1313979769

   I will open an issue in Solr to adapt their code.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] rmuir commented on pull request #11917: Automatically preload index files that are both tiny and very hot.

Posted by GitBox <gi...@apache.org>.

rmuir commented on PR #11917:
URL: https://github.com/apache/lucene/pull/11917#issuecomment-1314126519

   Well it was good to mention mlock as that "makes more sense" than just "preload" (which has a confusing name IMO). "preload" just faults it in but if there is memory pressure it can then just be paged out, so that's why i don't think it "does anything".
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] jpountz commented on a diff in pull request #11917: Automatically preload index files that are both tiny and very hot.

Posted by GitBox <gi...@apache.org>.

jpountz commented on code in PR #11917:
URL: https://github.com/apache/lucene/pull/11917#discussion_r1020267346


##########
lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java:
##########
@@ -235,7 +235,7 @@ public IndexInput openInput(String name, IOContext context) throws IOException {
     ensureOpen();
     ensureCanRead(name);
     Path path = directory.resolve(name);
-    return PROVIDER.openInput(path, context, chunkSizePower, preload, useUnmapHack);
+    return PROVIDER.openInput(path, context, chunkSizePower, preload || context.load, useUnmapHack);

Review Comment:
   > So I would move the || context.load to somewhere in MappedByteBufferIndexInputProvider and MemorySegmentIndexInputProvider so they can have different code.
   
   Do you mean making it a separate argument?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] uschindler commented on a diff in pull request #11917: Automatically preload index files that are both tiny and very hot.

Posted by GitBox <gi...@apache.org>.

uschindler commented on code in PR #11917:
URL: https://github.com/apache/lucene/pull/11917#discussion_r1020403519


##########
lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java:
##########
@@ -235,7 +235,7 @@ public IndexInput openInput(String name, IOContext context) throws IOException {
     ensureOpen();
     ensureCanRead(name);
     Path path = directory.resolve(name);
-    return PROVIDER.openInput(path, context, chunkSizePower, preload, useUnmapHack);
+    return PROVIDER.openInput(path, context, chunkSizePower, preload || context.load, useUnmapHack);

Review Comment:
   No, the context is passed already and the provider class could decide how to handle the IO context.
   
   But leave it as is for now. If we want to have different behaviour for MemeorySegments and MappedByteBuffer in future, I can refactor it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] jpountz commented on pull request #11917: Automatically preload index files that are both tiny and very hot.

Posted by GitBox <gi...@apache.org>.

jpountz commented on PR #11917:
URL: https://github.com/apache/lucene/pull/11917#issuecomment-1313830132

   @uschindler I think I understand Robert's concerns against mlock, but less yours against loading. Do you think that it would be useless to preload if these files can then get paged out? My thinking was that most users would rather like Lucene to take a bit longer to open indices if it helps provide better performance on initial queries, noting that it wouldn't make opening much slower given how tiny these files tend to be compared to the overall dataset (~0.17% of the total index size for the index generated by nightly benchmarks). In general these files would stay in the page cache all the time, but if they don't because of unusual access patterns or competing processes and the page cache decides to page out some data, it wouldn't be much worse than today. I do think some users will see appeal in the mlock option and getting stronger guarantees that this data wouldn't get paged out, but this feels like something that should be an opt-in rather than the default?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] jpountz commented on pull request #11917: Automatically preload index files that are both tiny and very hot.

Posted by GitBox <gi...@apache.org>.

jpountz commented on PR #11917:
URL: https://github.com/apache/lucene/pull/11917#issuecomment-1313968557

   OK. I wanted to use names that give a sense of when these general `BiPredicate` objects are useful without looking up javadocs, but your point about reusing for mlock makes sense.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] rmuir commented on pull request #11917: Automatically preload index files that are both tiny and very hot.

Posted by GitBox <gi...@apache.org>.

rmuir commented on PR #11917:
URL: https://github.com/apache/lucene/pull/11917#issuecomment-1314077245

   also wanna note my concern around any use of mlock, that, users may not know how to do the right thing and be lazy and just run as root. this is sadly the stuff that users do, i'd hate to encourage it.
   
   so to me it seems like the heap is more user-friendly way to get the performance that you want.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] uschindler commented on pull request #11917: Automatically preload index files that are both tiny and very hot.

Posted by GitBox <gi...@apache.org>.

uschindler commented on PR #11917:
URL: https://github.com/apache/lucene/pull/11917#issuecomment-1313888642

   we can keep and deprecate the old boolean setter that delegates to 2 predefined "alaways on", "always off" predicates.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] uschindler commented on pull request #11917: Automatically preload index files that are both tiny and very hot.

Posted by GitBox <gi...@apache.org>.

uschindler commented on PR #11917:
URL: https://github.com/apache/lucene/pull/11917#issuecomment-1313965347

   looks great. I don't like the `PRELOAD_` prefix on the constants, because I plan to use the same for mlock2. I was already thinking about this, but came to no good names. So my proposal was the above.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] uschindler commented on pull request #11917: Automatically preload index files that are both tiny and very hot.

Posted by GitBox <gi...@apache.org>.

uschindler commented on PR #11917:
URL: https://github.com/apache/lucene/pull/11917#issuecomment-1313978152

   Looks good to me, i think maybe say for the constant that it uses LOAD context.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] jpountz closed pull request #11917: Automatically preload index files that are both tiny and very hot.

Posted by GitBox <gi...@apache.org>.

jpountz closed pull request #11917: Automatically preload index files that are both tiny and very hot.
URL: https://github.com/apache/lucene/pull/11917


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] uschindler commented on a diff in pull request #11917: Automatically preload index files that are both tiny and very hot.

Posted by GitBox <gi...@apache.org>.

uschindler commented on code in PR #11917:
URL: https://github.com/apache/lucene/pull/11917#discussion_r1019662247


##########
lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java:
##########
@@ -235,7 +235,7 @@ public IndexInput openInput(String name, IOContext context) throws IOException {
     ensureOpen();
     ensureCanRead(name);
     Path path = directory.resolve(name);
-    return PROVIDER.openInput(path, context, chunkSizePower, preload, useUnmapHack);
+    return PROVIDER.openInput(path, context, chunkSizePower, preload || context.load, useUnmapHack);

Review Comment:
   See my last message: It may be better in future to leave the decission to preload or map to the implementation of the provider. For exactly that reason the context is passed to the provider!
   
   So I would move the `|| context.load` to somewhere in MappedByteBufferIndexInputProvider and MemorySegmentIndexInputProvider so they can have different code.
   
   Maybe also rename "load" to something else because it is up to implementation how to guarantee how something is "sticked to memory".



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] rmuir commented on pull request #11917: Automatically preload index files that are both tiny and very hot.

Posted by GitBox <gi...@apache.org>.

rmuir commented on PR #11917:
URL: https://github.com/apache/lucene/pull/11917#issuecomment-1310974545

   I think preload is different from mlock, mlock needs way more discussion and personally I'm against it. mlock would be an operational hassle because of default resource limits on linux as well. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] rmuir commented on pull request #11917: Automatically preload index files that are both tiny and very hot.

Posted by GitBox <gi...@apache.org>.

rmuir commented on PR #11917:
URL: https://github.com/apache/lucene/pull/11917#issuecomment-1313991795

> @uschindler I think I understand Robert's concerns against mlock, but less yours against loading. Do you think that it would be useless to preload if these files can then get paged out? My thinking was that most users would rather like Lucene to take a bit longer to open indices if it helps provide better performance on initial queries, noting that it wouldn't make opening much slower given how tiny these files tend to be compared to the overall dataset (~0.17% of the total index size for the index generated by nightly benchmarks). In general these files would stay in the page cache all the time, but if they don't because of unusual access patterns or competing processes and the page cache decides to page out some data, it wouldn't be much worse than today. I do think some users will see appeal in the mlock option and getting stronger guarantees that this data wouldn't get paged out, but this feels like something that should be an opt-in rather than the default?

I'm not against preloading i just don't think it does anything here. It just madvises the kernel for read-ahead and then touches every page.

Seems something went badly wrong here: if these files need to be hot, then just load the structures into the heap? all this talk about preloading and locking pages is not good. why were these "small hot files" removed from the heap?

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] uschindler commented on pull request #11917: Automatically preload index files that are both tiny and very hot.

Posted by GitBox <gi...@apache.org>.

uschindler commented on PR #11917:
URL: https://github.com/apache/lucene/pull/11917#issuecomment-1314131627

   I think to get a "real sticky preload to heap", we would need to add a NRTCachingDirectory-like wrapper on top of any Directory that copies the file into ByteBuffersIndexInput.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] jpountz commented on pull request #11917: Automatically preload index files that are both tiny and very hot.

Posted by GitBox <gi...@apache.org>.

jpountz commented on PR #11917:
URL: https://github.com/apache/lucene/pull/11917#issuecomment-1313905183

   @uschindler Something like that?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] uschindler commented on pull request #11917: Automatically preload index files that are both tiny and very hot.

Posted by GitBox <gi...@apache.org>.

uschindler commented on PR #11917:
URL: https://github.com/apache/lucene/pull/11917#issuecomment-1314009331

   I don't care about the new behaviour, as I still ave the option to configure it. I agree with Robert, if the files should stay on heap load them to heap.
   
   What I like most on this PR is that due to the predicate it is possible to configure easily which files you want to preload.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] uschindler commented on pull request #11917: Automatically preload index files that are both tiny and very hot.

Posted by GitBox <gi...@apache.org>.

uschindler commented on PR #11917:
URL: https://github.com/apache/lucene/pull/11917#issuecomment-1314111220

   BTW, sorry that I mentioned mlock2... oh man oh man.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] uschindler commented on a diff in pull request #11917: Automatically preload index files that are both tiny and very hot.

Posted by GitBox <gi...@apache.org>.

uschindler commented on code in PR #11917:
URL: https://github.com/apache/lucene/pull/11917#discussion_r1021728743


##########
lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java:
##########
@@ -85,25 +85,23 @@ public class MMapDirectory extends FSDirectory {
    * Argument for {@link #setPreload(BiPredicate)} that configures all files to be preloaded upon
    * opening them.
    */
-  public static final BiPredicate<String, IOContext> PRELOAD_ALL_FILES =
-      (filename, context) -> true;
+  public static final BiPredicate<String, IOContext> ALL_FILES = (filename, context) -> true;
 
   /**
    * Argument for {@link #setPreload(BiPredicate)} that configures no files to be preloaded upon
    * opening them.
    */
-  public static final BiPredicate<String, IOContext> PRELOAD_NO_FILES =
-      (filename, context) -> false;
+  public static final BiPredicate<String, IOContext> NO_FILES = (filename, context) -> false;
 
   /**
    * Argument for {@link #setPreload(BiPredicate)} that configures all files that use the {@link
    * IOContext#LOAD} to be preloaded upon opening them. This is the default.
    */
-  public static final BiPredicate<String, IOContext> PRELOAD_BASED_ON_IO_CONTEXT =
+  public static final BiPredicate<String, IOContext> BASED_ON_IO_CONTEXT =
       (filename, context) -> context.load;
 
   private boolean useUnmapHack = UNMAP_SUPPORTED;
-  private BiPredicate<String, IOContext> preload = PRELOAD_BASED_ON_IO_CONTEXT;
+  private BiPredicate<String, IOContext> preload = BASED_ON_IO_CONTEXT;

Review Comment:
   maybe `BASED_ON_LOAD_IO_CONTEXT`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] mikemccand commented on pull request #11917: Automatically preload index files that are both tiny and very hot.

Posted by GitBox <gi...@apache.org>.

mikemccand commented on PR #11917:
URL: https://github.com/apache/lucene/pull/11917#issuecomment-1315303887

   > this test failed for me yesterday in the same way. Seems to be a bug.
   
   This test has been failing for some time -- we have #10878 open for it.  I'll try to make some time to figure out what's happening.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] uschindler commented on pull request #11917: Automatically preload index files that are both tiny and very hot.

Posted by GitBox <gi...@apache.org>.

uschindler commented on PR #11917:
URL: https://github.com/apache/lucene/pull/11917#issuecomment-1313978716

   ...namining is hard...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] jpountz commented on pull request #11917: Automatically preload index files that are both tiny and very hot.

Posted by GitBox <gi...@apache.org>.

jpountz commented on PR #11917:
URL: https://github.com/apache/lucene/pull/11917#issuecomment-1314053441

   I reverted the bits that would load some files in physical memory by default. These files don't have to be in memory and I actually like that these files are not in the heap because it makes lots of things easier, e.g. you don't have to size our heap depending on how much data you're dealing with - 0.17% is still 1.75GB of memory if you have 1TB worth of index data.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] jpountz commented on pull request #11917: Automatically preload index files that are both tiny and very hot.

Posted by GitBox <gi...@apache.org>.

jpountz commented on PR #11917:
URL: https://github.com/apache/lucene/pull/11917#issuecomment-1313887463

   Agreed, this would be a welcome simplification.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] uschindler commented on a diff in pull request #11917: Automatically preload index files that are both tiny and very hot.

Posted by GitBox <gi...@apache.org>.

uschindler commented on code in PR #11917:
URL: https://github.com/apache/lucene/pull/11917#discussion_r1021850600


##########
lucene/core/src/java/org/apache/lucene/store/IOContext.java:
##########
@@ -45,8 +45,8 @@ public enum Context {
   public final boolean readOnce;
 
   /**
-   * This flag indicates that the file is expected to be heavily accessed in a random-access fashion
-   * and should have its content in memory.
+   * This flag indicates that the file is expected to be heavily accessed in a random-access

Review Comment:
   But what is tiny? 1.6 Gigabytes?!?!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] uschindler commented on a diff in pull request #11917: Automatically preload index files that are both tiny and very hot.

Posted by GitBox <gi...@apache.org>.

uschindler commented on code in PR #11917:
URL: https://github.com/apache/lucene/pull/11917#discussion_r1020409331


##########
lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java:
##########
@@ -235,7 +235,7 @@ public IndexInput openInput(String name, IOContext context) throws IOException {
     ensureOpen();
     ensureCanRead(name);
     Path path = directory.resolve(name);
-    return PROVIDER.openInput(path, context, chunkSizePower, preload, useUnmapHack);
+    return PROVIDER.openInput(path, context, chunkSizePower, preload || context.load, useUnmapHack);

Review Comment:
   I would have moved the decission to those 2 places, no extra param needed:
   - https://github.com/apache/lucene/blob/57ac311c709da2f398ea73896eae158b1126c5b4/lucene/core/src/java/org/apache/lucene/store/MappedByteBufferIndexInputProvider.java#L77
   - https://github.com/apache/lucene/blob/57ac311c709da2f398ea73896eae158b1126c5b4/lucene/core/src/java19/org/apache/lucene/store/MemorySegmentIndexInputProvider.java#L54
   
   But thats not important. This issue here was exactly "planned" when I made the provider's signature. Currently the context is not used downstream.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] uschindler commented on a diff in pull request #11917: Automatically preload index files that are both tiny and very hot.

Posted by GitBox <gi...@apache.org>.

uschindler commented on code in PR #11917:
URL: https://github.com/apache/lucene/pull/11917#discussion_r1020410114


##########
lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java:
##########
@@ -235,7 +235,7 @@ public IndexInput openInput(String name, IOContext context) throws IOException {
     ensureOpen();
     ensureCanRead(name);
     Path path = directory.resolve(name);
-    return PROVIDER.openInput(path, context, chunkSizePower, preload, useUnmapHack);
+    return PROVIDER.openInput(path, context, chunkSizePower, preload || context.load, useUnmapHack);

Review Comment:
   So for now keep it as is!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org