You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/12/06 03:14:43 UTC

[GitHub] [lucene] rmuir opened a new pull request, #11998: Migrate away from per-segment-per-threadlocals on SegmentReader

rmuir opened a new pull request, #11998:
URL: https://github.com/apache/lucene/pull/11998

   Currently the stored fields and term vectors apis on the index are "stateless".
   Unlike the other parts of the APIs, users can't call any iterators/enumerators, they just do stuff like:
   ```
   indexReader.document(0);
   indexReader.document(1);
   ... (up to potentially thousands of docs because lusers do that)
   ```
   
   Instead of adding any real iterator, threadlocals were added to prevent from having to clone() the reader on every document. For example this could reduce the amount of NIOFS buffer refills and so on.
   
   But the old API from a previous time, only gets worse these days, because the implementations are more complicated and do block-compression, dictionaries, etc.
   
   The threadlocals in segmentreader can cause memory issues if you have tons of segments, tons of threads, or especially both. Seems plenty of java developers can't help but run into it.
   
   I propose we deprecate these APIs and let the user get the iterator themselves e.g. per-search, without any threadlocal.
   ```
   StoredFields fields = indexReader.storedFields();
   for (docs in results) {
      dosomethingwith = fields.document(n);
   }
   // now fields can be gc'd
   ```
   It will re-use the datastructures if someone has thousands and thousands of hits, but avoid the threadlocal pain.
   
   NOTES: this is just a draft to demonstrate the idea. I'm not sure i have the resources to see it through, since it is a lot of labor.
   The old APIs/deprecations are here, and if you use deprecated methods, you will use threadlocals just like before. But if you don't call deprecated APIs, then no threadlocals are used.
   
   I didn't cut over all tests (which would be enormous effort), and some tests will fail with `java.lang.UnsupportedOperationException: deprecated document access is not supported`. That's because I don't want threadlocal nonsense to support deprecated stuff in CodecReader (belongs only in SegmentReader). We should keep CodecReader clean. Unfortunately lots of tests like to wrap their readers with codec/merging readers, so if they do that, these tests really need to be fixed to get off the deprecated stuff, so they will pass again.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on pull request #11998: Migrate away from per-segment-per-threadlocals on SegmentReader

Posted by GitBox <gi...@apache.org>.
jpountz commented on PR #11998:
URL: https://github.com/apache/lucene/pull/11998#issuecomment-1343043440

   I pushed most call sites I think. The main remaining ones are in lucene/highlighter, which require a bit more changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on pull request #11998: Migrate away from per-segment-per-threadlocals on SegmentReader

Posted by GitBox <gi...@apache.org>.
rmuir commented on PR #11998:
URL: https://github.com/apache/lucene/pull/11998#issuecomment-1339780615

   That's fine. or we could fix `newSearcher` to not wrap with crazy CodecReader's. or we could fix said CodecReaders (since they are only used for tests) to implement the deprecated document apis, like SegmentReader does. Or we could give CodecReader a default impl that isn't very performant other than UOE.
   
   I wanted to throw the UOE, at least at first, to be sure i knew exactly what was calling old .document API (e.g. in case i forgot to fix a filter-reader). But it doesn't have to stay.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on pull request #11998: Migrate away from per-segment-per-threadlocals on SegmentReader

Posted by "rmuir (via GitHub)" <gi...@apache.org>.
rmuir commented on PR #11998:
URL: https://github.com/apache/lucene/pull/11998#issuecomment-1454871129

   This was done on purpose as there's no such guarantee on `Terms` instances. Of particular concern are the ones returned back from `TermVectors` class. 
   
   You can see above in the iterations on this PR that we beefed up thread-confinement tests and documentation for both `StoredFields` and `TermVectors`: these data structures were always thread-confined before by `ThreadLocal` variables. Now it is on the user and (unfortunately) it opens up the possibility of mistakes. 
   
   We no longer "hand-hold" users with `ThreadLocal` variables, but at the same time, it allows for large memory reduction for applications that don't need them.
   
   I don't think MultiTerms/Terms should be accessed by multiple threads either, there's no guarantee that here, that there isnt some unsafe publishing of variables updated or something like that: please don't cache them. If you insist on caching them, maybe consider using your own ThreadLocal for that purpose.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir merged pull request #11998: Migrate away from per-segment-per-threadlocals on SegmentReader

Posted by GitBox <gi...@apache.org>.
rmuir merged PR #11998:
URL: https://github.com/apache/lucene/pull/11998


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on pull request #11998: Migrate away from per-segment-per-threadlocals on SegmentReader

Posted by GitBox <gi...@apache.org>.
rmuir commented on PR #11998:
URL: https://github.com/apache/lucene/pull/11998#issuecomment-1345665387

   +1 to improve docs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on a diff in pull request #11998: Migrate away from per-segment-per-threadlocals on SegmentReader

Posted by GitBox <gi...@apache.org>.
jpountz commented on code in PR #11998:
URL: https://github.com/apache/lucene/pull/11998#discussion_r1043616431


##########
lucene/queries/src/java/org/apache/lucene/queries/mlt/MoreLikeThis.java:
##########
@@ -268,6 +270,12 @@ public final class MoreLikeThis {
   /** IndexReader to use */
   private final IndexReader ir;
 
+  /** Stored fields for {@code ir}. */
+  private final StoredFields storedFields;

Review Comment:
   Good call, thanks for catching this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] hydrogen666 commented on pull request #11998: Migrate away from per-segment-per-threadlocals on SegmentReader

Posted by "hydrogen666 (via GitHub)" <gi...@apache.org>.
hydrogen666 commented on PR #11998:
URL: https://github.com/apache/lucene/pull/11998#issuecomment-1556962751

   In previous version `StoredFieldsReader` is cached in `ThreadLocal`, but now we need to `clone` `StoredFieldsReader` every time if we need to visit store fields. Will this PR cause any performance issue?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on pull request #11998: Migrate away from per-segment-per-threadlocals on SegmentReader

Posted by GitBox <gi...@apache.org>.
rmuir commented on PR #11998:
URL: https://github.com/apache/lucene/pull/11998#issuecomment-1339823026

   I got the tests happy for now with 12a5dfaeba954a049675830eabd54bd8f58b51c2
   
   Maybe not the right solution in the end, but makes it easier to iterate when you have passing tests at least.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] dsmiley commented on pull request #11998: Migrate away from per-segment-per-threadlocals on SegmentReader

Posted by "dsmiley (via GitHub)" <gi...@apache.org>.
dsmiley commented on PR #11998:
URL: https://github.com/apache/lucene/pull/11998#issuecomment-1454862916

   Love the change here to migrate away from per-segment-per-threadlocals!
   
   When [porting this to Solr](https://github.com/apache/solr/pull/1360), I noticed a new assertion that previously didn't exist.  AssertingLeafReader now ensures that a Terms instance is only ever accessed by the thread that created it.  Terms does not document wether it's thread-safe or not but I don't believe I've ever encountered one that wasn't thread-safe.  Furthermore, the result of MultiTerms.getTerms is definitely thread-safe and can be useful to cache as there is some overhead.  Thoughts on this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on a diff in pull request #11998: Migrate away from per-segment-per-threadlocals on SegmentReader

Posted by GitBox <gi...@apache.org>.
rmuir commented on code in PR #11998:
URL: https://github.com/apache/lucene/pull/11998#discussion_r1043343548


##########
lucene/test-framework/src/java/org/apache/lucene/tests/index/BaseStoredFieldsFormatTestCase.java:
##########
@@ -813,6 +837,7 @@ public void testBulkMergeWithDeletes() throws IOException {
   }
 
   /** mix up field numbers, merge, and check that data is correct */
+  @AwaitsFix(bugUrl = "WTF with this test")

Review Comment:
   thank you!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on a diff in pull request #11998: Migrate away from per-segment-per-threadlocals on SegmentReader

Posted by GitBox <gi...@apache.org>.
jpountz commented on code in PR #11998:
URL: https://github.com/apache/lucene/pull/11998#discussion_r1045086896


##########
lucene/test-framework/src/java/org/apache/lucene/tests/index/AssertingLeafReader.java:
##########
@@ -113,34 +116,84 @@ public Fields getTermVectors(int docID) throws IOException {
     return fields == null ? null : new AssertingFields(fields);
   }
 
+  @Override
+  public TermVectors termVectors() throws IOException {
+    return new AssertingTermVectors(super.termVectors());
+  }
+
+  @Override
+  public StoredFields storedFields() throws IOException {
+    return new AssertingStoredFields(super.storedFields());
+  }
+
+  /** Wraps a StoredFields but with additional asserts */
+  public static class AssertingStoredFields extends StoredFields {
+    private final StoredFields in;
+    private final Thread creationThread = Thread.currentThread();
+
+    public AssertingStoredFields(StoredFields in) {
+      this.in = in;
+    }
+
+    @Override
+    public void document(int docID, StoredFieldVisitor visitor) throws IOException {
+      assertThread("StoredFields", creationThread);
+      in.document(docID, visitor);
+    }
+  }
+
+  /** Wraps a TermVectors but with additional asserts */
+  public static class AssertingTermVectors extends TermVectors {
+    private final TermVectors in;
+    private final Thread creationThread = Thread.currentThread();
+
+    public AssertingTermVectors(TermVectors in) {
+      this.in = in;
+    }
+
+    @Override
+    public Fields get(int doc) throws IOException {
+      assertThread("TermVectors", creationThread);
+      Fields fields = in.get(doc);
+      return fields == null ? null : new AssertingFields(fields);
+    }
+  }
+
   /** Wraps a Fields but with additional asserts */
   public static class AssertingFields extends FilterFields {
+    private final Thread creationThread = Thread.currentThread();

Review Comment:
   +1 to check Terms and Fields too. Most instances returned by codecs should be thread-safe but it's still good to make sure consumers don't share them across threads for consistency with other codec APIs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] hydrogen666 commented on pull request #11998: Migrate away from per-segment-per-threadlocals on SegmentReader

Posted by "hydrogen666 (via GitHub)" <gi...@apache.org>.
hydrogen666 commented on PR #11998:
URL: https://github.com/apache/lucene/pull/11998#issuecomment-1557118792

   > In previous version, `StoredFieldsReader` is cached in `ThreadLocal`, but now we need to `clone` `StoredFieldsReader` every time if we need to visit store fields. Will this PR cause any performance issue?
   
   It is reasonable to clone `StoredFieldsReader` in one `IndexSearcher` context because one search request may hit many docs, but in some circumstances such as get by `_id` in Elasticsearch, cloning `StoredFieldsReader` every time may cause performance issue? Does my aforementioned concern make sense?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on a diff in pull request #11998: Migrate away from per-segment-per-threadlocals on SegmentReader

Posted by GitBox <gi...@apache.org>.
rmuir commented on code in PR #11998:
URL: https://github.com/apache/lucene/pull/11998#discussion_r1043614087


##########
lucene/queries/src/java/org/apache/lucene/queries/mlt/MoreLikeThis.java:
##########
@@ -268,6 +270,12 @@ public final class MoreLikeThis {
   /** IndexReader to use */
   private final IndexReader ir;
 
+  /** Stored fields for {@code ir}. */
+  private final StoredFields storedFields;

Review Comment:
   hmm, should we do it this way as instance members? `MoreLikeThis` was thread-safe before... i think. This would now make it unsafe. Maybe we should just create these in the actual method that wants to pull from storedfields?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on a diff in pull request #11998: Migrate away from per-segment-per-threadlocals on SegmentReader

Posted by GitBox <gi...@apache.org>.
jpountz commented on code in PR #11998:
URL: https://github.com/apache/lucene/pull/11998#discussion_r1043042489


##########
lucene/test-framework/src/java/org/apache/lucene/tests/index/BaseStoredFieldsFormatTestCase.java:
##########
@@ -813,6 +837,7 @@ public void testBulkMergeWithDeletes() throws IOException {
   }
 
   /** mix up field numbers, merge, and check that data is correct */
+  @AwaitsFix(bugUrl = "WTF with this test")

Review Comment:
   What is the problem? I'm not too familiar with it but it seems to test that merging correctly de-optimizes bulk merges when field numbers are not aligned, which makes sense?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] dweiss commented on a diff in pull request #11998: Migrate away from per-segment-per-threadlocals on SegmentReader

Posted by GitBox <gi...@apache.org>.
dweiss commented on code in PR #11998:
URL: https://github.com/apache/lucene/pull/11998#discussion_r1045027949


##########
lucene/test-framework/src/java/org/apache/lucene/tests/index/AssertingLeafReader.java:
##########
@@ -113,34 +116,84 @@ public Fields getTermVectors(int docID) throws IOException {
     return fields == null ? null : new AssertingFields(fields);
   }
 
+  @Override
+  public TermVectors termVectors() throws IOException {
+    return new AssertingTermVectors(super.termVectors());
+  }
+
+  @Override
+  public StoredFields storedFields() throws IOException {
+    return new AssertingStoredFields(super.storedFields());
+  }
+
+  /** Wraps a StoredFields but with additional asserts */
+  public static class AssertingStoredFields extends StoredFields {
+    private final StoredFields in;
+    private final Thread creationThread = Thread.currentThread();
+
+    public AssertingStoredFields(StoredFields in) {
+      this.in = in;
+    }
+
+    @Override
+    public void document(int docID, StoredFieldVisitor visitor) throws IOException {
+      assertThread("StoredFields", creationThread);
+      in.document(docID, visitor);
+    }
+  }
+
+  /** Wraps a TermVectors but with additional asserts */
+  public static class AssertingTermVectors extends TermVectors {
+    private final TermVectors in;
+    private final Thread creationThread = Thread.currentThread();
+
+    public AssertingTermVectors(TermVectors in) {
+      this.in = in;
+    }
+
+    @Override
+    public Fields get(int doc) throws IOException {
+      assertThread("TermVectors", creationThread);
+      Fields fields = in.get(doc);
+      return fields == null ? null : new AssertingFields(fields);
+    }
+  }
+
   /** Wraps a Fields but with additional asserts */
   public static class AssertingFields extends FilterFields {
+    private final Thread creationThread = Thread.currentThread();

Review Comment:
   +1. It's not just thread-safety but strict thread-confinement. I assume we're fine with this stricter rule within tests/codebase. I don't see any reason why these objects should be passed around (?).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on pull request #11998: Migrate away from per-segment-per-threadlocals on SegmentReader

Posted by GitBox <gi...@apache.org>.
rmuir commented on PR #11998:
URL: https://github.com/apache/lucene/pull/11998#issuecomment-1344550233

   I cutover all the remaining stuff, by nuking the old api locally and making sure full gradle check passes.
   
   attached is the patch i used to nuke the old APIs.
   [nukeDeprecated.patch.txt](https://github.com/apache/lucene/files/10196690/nukeDeprecated.patch.txt)
   
   It reveals two more things to fix:
   * Fix test-framework's FieldFilterLeafReader to implement new APIs
   * Look at test-framework's AssertingLeafReader to see if we can add safety checks to new APIs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on a diff in pull request #11998: Migrate away from per-segment-per-threadlocals on SegmentReader

Posted by GitBox <gi...@apache.org>.
rmuir commented on code in PR #11998:
URL: https://github.com/apache/lucene/pull/11998#discussion_r1045078398


##########
lucene/test-framework/src/java/org/apache/lucene/tests/index/AssertingLeafReader.java:
##########
@@ -113,34 +116,84 @@ public Fields getTermVectors(int docID) throws IOException {
     return fields == null ? null : new AssertingFields(fields);
   }
 
+  @Override
+  public TermVectors termVectors() throws IOException {
+    return new AssertingTermVectors(super.termVectors());
+  }
+
+  @Override
+  public StoredFields storedFields() throws IOException {
+    return new AssertingStoredFields(super.storedFields());
+  }
+
+  /** Wraps a StoredFields but with additional asserts */
+  public static class AssertingStoredFields extends StoredFields {
+    private final StoredFields in;
+    private final Thread creationThread = Thread.currentThread();
+
+    public AssertingStoredFields(StoredFields in) {
+      this.in = in;
+    }
+
+    @Override
+    public void document(int docID, StoredFieldVisitor visitor) throws IOException {
+      assertThread("StoredFields", creationThread);
+      in.document(docID, visitor);
+    }
+  }
+
+  /** Wraps a TermVectors but with additional asserts */
+  public static class AssertingTermVectors extends TermVectors {
+    private final TermVectors in;
+    private final Thread creationThread = Thread.currentThread();
+
+    public AssertingTermVectors(TermVectors in) {
+      this.in = in;
+    }
+
+    @Override
+    public Fields get(int doc) throws IOException {
+      assertThread("TermVectors", creationThread);
+      Fields fields = in.get(doc);
+      return fields == null ? null : new AssertingFields(fields);
+    }
+  }
+
   /** Wraps a Fields but with additional asserts */
   public static class AssertingFields extends FilterFields {
+    private final Thread creationThread = Thread.currentThread();

Review Comment:
   I had the idea to add these checks for stored/fields vectors just to fail a test clearly rather than with some "crazy" behavior that would be difficult to debug. 
   
   Looks like i wasn't the first person to have this idea, as @jpountz already added most of them 8 years ago: https://github.com/apache/lucene/commit/fc94b0b4d9e0
   
   For the postings API, previously we had existing checks on `TermsEnum`, `PostingsEnum`. But `Terms` and `Fields` were not checked. Seems like an oversight to me, they are all in the same codec postings api (also used by term vectors). So, depending on codec's implementation, codec might do e.g. any number of disk reads or whatever it wants and these should be checked too.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on a diff in pull request #11998: Migrate away from per-segment-per-threadlocals on SegmentReader

Posted by GitBox <gi...@apache.org>.
jpountz commented on code in PR #11998:
URL: https://github.com/apache/lucene/pull/11998#discussion_r1043183580


##########
lucene/test-framework/src/java/org/apache/lucene/tests/index/BaseStoredFieldsFormatTestCase.java:
##########
@@ -813,6 +837,7 @@ public void testBulkMergeWithDeletes() throws IOException {
   }
 
   /** mix up field numbers, merge, and check that data is correct */
+  @AwaitsFix(bugUrl = "WTF with this test")

Review Comment:
   I pushed a fix for this test.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org