You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/05/19 20:37:02 UTC

[GitHub] [lucene] shahrs87 opened a new pull request, #907: LUCENE-10357 Ghost fields and postings/points

shahrs87 opened a new pull request, #907:
URL: https://github.com/apache/lucene/pull/907

   <!--
   _(If you are a project committer then you may remove some/all of the following template.)_
   
   Before creating a pull request, please file an issue in the ASF Jira system for Lucene:
   
   * https://issues.apache.org/jira/projects/LUCENE
   
   You will need to create an account in Jira in order to create an issue.
   
   The title of the PR should reference the Jira issue number in the form:
   
   * LUCENE-####: <short description of problem or changes>
   
   LUCENE must be fully capitalized. A short description helps people scanning pull requests for items they can work on.
   
   Properly referencing the issue in the title ensures that Jira is correctly updated with code review comments and commits. -->
   
   
   # Description
   
   Please provide a short description of the changes you're making with this pull request.
   
   # Solution
   
   Please provide a short description of the approach taken to implement your solution.
   
   # Tests
   
   Please describe the tests you've developed or run to confirm this patch implements the feature or solves the problem.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [ ] I have reviewed the guidelines for [How to Contribute](https://github.com/apache/lucene/blob/main/CONTRIBUTING.md) and my code conforms to the standards described there to the best of my ability.
   - [ ] I have given Lucene maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended)
   - [ ] I have developed this patch against the `main` branch.
   - [ ] I have run `./gradlew check`.
   - [ ] I have added tests for my changes.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] shahrs87 commented on pull request #907: LUCENE-10357 Ghost fields and postings/points

Posted by GitBox <gi...@apache.org>.
shahrs87 commented on PR #907:
URL: https://github.com/apache/lucene/pull/907#issuecomment-1256448795

   I was busy with some other security related work at my day job so couldn't update this PR. Apologies for that.
   @jpountz  Can you please review this PR again ? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on pull request #907: LUCENE-10357 Ghost fields and postings/points

Posted by GitBox <gi...@apache.org>.
jpountz commented on PR #907:
URL: https://github.com/apache/lucene/pull/907#issuecomment-1132617984

   Thanks for tackling this! Your are going in the right direction. Could we drop most of the `if (terms != Terms.EMPTY)` checks and handle them like any non-null `Terms` instance?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on a diff in pull request #907: LUCENE-10357 Ghost fields and postings/points

Posted by GitBox <gi...@apache.org>.
jpountz commented on code in PR #907:
URL: https://github.com/apache/lucene/pull/907#discussion_r896529395


##########
lucene/codecs/src/java/org/apache/lucene/codecs/bloom/BloomFilteringPostingsFormat.java:
##########
@@ -200,8 +200,8 @@ public Terms terms(String field) throws IOException {
         return delegateFieldsProducer.terms(field);
       } else {
         Terms result = delegateFieldsProducer.terms(field);
-        if (result == null) {
-          return null;
+        if (result == null || result == Terms.EMPTY) {

Review Comment:
   This case is a bit special indeed, but I think we should fix it too to make sure that it only returns a `null` `Terms` instance if the field doesn't exist (fieldInfo == null) or if the field doesn't index terms (indexOptions == NONE).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] shahrs87 commented on pull request #907: LUCENE-10357 Ghost fields and postings/points

Posted by GitBox <gi...@apache.org>.
shahrs87 commented on PR #907:
URL: https://github.com/apache/lucene/pull/907#issuecomment-1183747929

   Thank you @jpountz  for being so patient with me. I tried your above suggestion and hit the following problem.
   Locally I removed the following check from BloomFilteringPostingsFormat.java.
   ```
           if (result == Terms.EMPTY) {
             return Terms.EMPTY;
           }
   ```
   
   The following test failed: `org.apache.lucene.index.TestPostingsOffsets.testCrazyOffsetGap`
   Can be reproduced by:
    `gradlew :lucene:core:test --tests "org.apache.lucene.index.TestPostingsOffsets.testCrazyOffsetGap" -Ptests.jvms=8 -Ptests.jvmargs=-XX:TieredStopAtLevel=1 -Ptests.seed=66D42010A32F9625 -Ptests.locale=chr -Ptests.timezone=Asia/Jayapura -Ptests.gui=false -Ptests.file.encoding=UTF-8`
   
   The exception stack trace is:
   ```
   field "foo" should have hasFreqs=true but got false
   org.apache.lucene.index.CheckIndex$CheckIndexException: field "foo" should have hasFreqs=true but got false
   	at __randomizedtesting.SeedInfo.seed([66D42010A32F9625:91A6069AD047326B]:0)
   	at app//org.apache.lucene.index.CheckIndex.checkFields(CheckIndex.java:1434)
   	at app//org.apache.lucene.index.CheckIndex.testPostings(CheckIndex.java:2425)
   	at app//org.apache.lucene.index.CheckIndex.testSegment(CheckIndex.java:999)
   	at app//org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:714)
   	at app//org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:552)
   	at app//org.apache.lucene.tests.util.TestUtil.checkIndex(TestUtil.java:343)
   	at app//org.apache.lucene.tests.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:909)
   	at app//org.apache.lucene.index.TestPostingsOffsets.testCrazyOffsetGap(TestPostingsOffsets.java:462)
   ```
   
   The terms object [here](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/CheckIndex.java#L1380) is of type [PerFieldPostingsFormat#FieldsReader](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/perfield/PerFieldPostingsFormat.java#L352)
   The fieldsProducer object within  PerFieldPostingsFormat#FieldsReader is of type [BloomFilteringPostingsFormat#BloomFilteredFieldsProducer](https://github.com/apache/lucene/blob/main/lucene/codecs/src/java/org/apache/lucene/codecs/bloom/BloomFilteringPostingsFormat.java#L202)
   The delegateFieldsProducer within BloomFilteringPostingsFormat#BloomFilteredFieldsProducer is of type [Lucene90BlockTreeTermsReader](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/Lucene90BlockTreeTermsReader.java#L291)
   
   This is the code snippet which I changed within `Lucene90BlockTreeTermsReader#terms` method
   ```
     @Override
     public Terms terms(String field) throws IOException {
       assert field != null;
       Terms terms = fieldMap.get(field);
       return terms == null ? Terms.EMPTY : terms;
     }
   ```
   
   From your suggestion instead of returning Terms.EMPTY, I thought to return Terms.empty(fieldInfo) with overloaded hasFreqs, hasPositions, etc. methods. But the problem is there is no way to get hold of `FieldsInfo` object from `field` string. The fieldMap map within Lucene90BlockTreeTermsReader is empty.  Is it ok to change the method argument for terms method from field String to fieldInfo object within Lucene90BlockTreeTermsReader ?  `public Terms terms(String field) throws IOException` --> `public Terms terms(FieldInfo fieldInfo) throws IOException` I think NO but just wanted to ask.
   
   Please correct me if I am misunderstanding anything. Thank you again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on a diff in pull request #907: LUCENE-10357 Ghost fields and postings/points

Posted by GitBox <gi...@apache.org>.
jpountz commented on code in PR #907:
URL: https://github.com/apache/lucene/pull/907#discussion_r900220890


##########
lucene/core/src/java/org/apache/lucene/index/CheckIndex.java:
##########
@@ -1378,7 +1378,7 @@ private static Status.TermIndexStatus checkFields(
       computedFieldCount++;
 
       final Terms terms = fields.terms(field);
-      if (terms == null) {
+      if (terms == Terms.EMPTY) {

Review Comment:
   Let's remove this `if` block entirely?



##########
lucene/core/src/java/org/apache/lucene/index/CheckIndex.java:
##########
@@ -1378,7 +1378,7 @@ private static Status.TermIndexStatus checkFields(
       computedFieldCount++;
 
       final Terms terms = fields.terms(field);
-      if (terms == null) {
+      if (terms == Terms.EMPTY) {

Review Comment:
   Then let's fix codecs to return a `Terms` instance that has the correct values for `hasFreqs`, `hasOffsets`, `hasPositions` and `hasPayloads`? E.g. maybe you could add a new `Terms#empty(FieldInfo)` method that does the right thing based on the `FieldInfo` and leverage this method in postings formats?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] shahrs87 commented on pull request #907: LUCENE-10357 Ghost fields and postings/points

Posted by GitBox <gi...@apache.org>.
shahrs87 commented on PR #907:
URL: https://github.com/apache/lucene/pull/907#issuecomment-1171746326

   > could you now try to remove all instances of if (terms == Terms.EMPTY)?
   
    @jpountz, I tried to remove all the instances of `if (terms == Terms.EMPTY)?` but couldn't remove the remaining ones in the patch. Otherwise it will cause test failures.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on pull request #907: LUCENE-10357 Ghost fields and postings/points

Posted by GitBox <gi...@apache.org>.
jpountz commented on PR #907:
URL: https://github.com/apache/lucene/pull/907#issuecomment-1158977379

   Thanks @shahrs87, could you now try to remove all instances of `if (terms == Terms.EMPTY)`? Hopefully existing logic should work with Terms instances regardless of whether they are empty or not.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on pull request #907: LUCENE-10357 Ghost fields and postings/points

Posted by GitBox <gi...@apache.org>.
jpountz commented on PR #907:
URL: https://github.com/apache/lucene/pull/907#issuecomment-1310330397

   Apologies Luca, but after looking more at your changes, I'm getting worried that this change is harder than I had anticipated. I was optimistically hoping that never returning null PointValues would automatically help save some `NullPointerException`s but looking at your change it looks like there are many places where we just need to replace these null checks with other checks if we want to keep the logic correct, so things wouldn't magically work on segments that have ghost fields unless we explicitly test ghost fields. So I would suggest closing this PR, apologies for the time you spent on it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] shahrs87 commented on a diff in pull request #907: LUCENE-10357 Ghost fields and postings/points

Posted by GitBox <gi...@apache.org>.
shahrs87 commented on code in PR #907:
URL: https://github.com/apache/lucene/pull/907#discussion_r905561688


##########
lucene/core/src/java/org/apache/lucene/index/FrozenBufferedUpdates.java:
##########
@@ -595,7 +595,7 @@ private void setField(String field) throws IOException {
 
     DocIdSetIterator nextTerm(String field, BytesRef term) throws IOException {
       setField(field);
-      if (termsEnum != null) {
+      if (termsEnum != null && termsEnum != TermsEnum.EMPTY) {

Review Comment:
   The following test failed `TestPointQueries#testAllPointDocsWereDeletedAndThenMergedAgain`
   ```
   Cannot read field "bytes" because "other" is null
   java.lang.NullPointerException: Cannot read field "bytes" because "other" is null
   	at __randomizedtesting.SeedInfo.seed([16B9FF96CE2E2AF5:59A84BA7E3282B90]:0)
   	at org.apache.lucene.util.BytesRef.compareTo(BytesRef.java:159)
   	at org.apache.lucene.index.FrozenBufferedUpdates$TermDocsIterator.nextTerm(FrozenBufferedUpdates.java:604)
   	at org.apache.lucene.index.FrozenBufferedUpdates.applyTermDeletes(FrozenBufferedUpdates.java:473)
   	at org.apache.lucene.index.FrozenBufferedUpdates.apply(FrozenBufferedUpdates.java:175)
   	at org.apache.lucene.index.IndexWriter.forceApply(IndexWriter.java:5965)
   	at org.apache.lucene.index.IndexWriter.tryApply(IndexWriter.java:5865)
   	at org.apache.lucene.index.IndexWriter.lambda$publishFrozenUpdates$10(IndexWriter.java:2771)
   	at org.apache.lucene.index.IndexWriter$EventQueue.processEventsInternal(IndexWriter.java:328)
   	at org.apache.lucene.index.IndexWriter$EventQueue.processEvents(IndexWriter.java:317)
   	at org.apache.lucene.index.IndexWriter.processEvents(IndexWriter.java:5708)
   	at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:4038)
   	at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3995)
   	at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:2099)
   	at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:2080)
   	at org.apache.lucene.search.TestPointQueries.testAllPointDocsWereDeletedAndThenMergedAgain(TestPointQueries.java:1221)
   	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
   	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
   	at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on a diff in pull request #907: LUCENE-10357 Ghost fields and postings/points

Posted by GitBox <gi...@apache.org>.
jpountz commented on code in PR #907:
URL: https://github.com/apache/lucene/pull/907#discussion_r880758668


##########
lucene/core/src/java/org/apache/lucene/index/MappedMultiFields.java:
##########
@@ -43,8 +43,8 @@ public MappedMultiFields(MergeState mergeState, MultiFields multiFields) {
   @Override
   public Terms terms(String field) throws IOException {
     MultiTerms terms = (MultiTerms) in.terms(field);
-    if (terms == null) {
-      return null;
+    if (terms == null || terms == Terms.EMPTY) {

Review Comment:
   can we leave the `if` statement as is?



##########
lucene/core/src/java/org/apache/lucene/index/CheckIndex.java:
##########
@@ -1378,7 +1378,7 @@ private static Status.TermIndexStatus checkFields(
       computedFieldCount++;
 
       final Terms terms = fields.terms(field);
-      if (terms == null) {
+      if (terms == Terms.EMPTY) {

Review Comment:
   We should remove this `if` statement. There is a check a few lines above that indexing is enabled on the field, so terms must not be null.



##########
lucene/core/src/java/org/apache/lucene/index/FrozenBufferedUpdates.java:
##########
@@ -595,7 +595,7 @@ private void setField(String field) throws IOException {
 
     DocIdSetIterator nextTerm(String field, BytesRef term) throws IOException {
       setField(field);
-      if (termsEnum != null) {
+      if (termsEnum != null && termsEnum != TermsEnum.EMPTY) {

Review Comment:
   would it work to leave the `if` statement as is?



##########
lucene/test-framework/src/java/org/apache/lucene/tests/codecs/asserting/AssertingPostingsFormat.java:
##########
@@ -79,7 +79,10 @@ public Iterator<String> iterator() {
     @Override
     public Terms terms(String field) throws IOException {
       Terms terms = in.terms(field);
-      return terms == null ? null : new AssertingLeafReader.AssertingTerms(terms);
+      if (terms == Terms.EMPTY) {

Review Comment:
   let's remove this check, and assert that `terms` in not null (`terms(String field)` map only get called on codec APIs if the field is indexed



##########
lucene/codecs/src/java/org/apache/lucene/codecs/bloom/BloomFilteringPostingsFormat.java:
##########
@@ -200,8 +200,8 @@ public Terms terms(String field) throws IOException {
         return delegateFieldsProducer.terms(field);
       } else {
         Terms result = delegateFieldsProducer.terms(field);
-        if (result == null) {
-          return null;
+        if (result == null || result == Terms.EMPTY) {

Review Comment:
   since there are no ghost fields anymore, we should be able to remove the `if` statement entirely, does it cause test failures?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on pull request #907: LUCENE-10357 Ghost fields and postings/points

Posted by GitBox <gi...@apache.org>.
jpountz commented on PR #907:
URL: https://github.com/apache/lucene/pull/907#issuecomment-1181518177

   @shahrs87 Can you look into removing all other instances of `terms == Terms.EMPTY` or `terms != Terms.EMPTY` as well? To do this while keeping tests passing, I think you'll need to create empty `Terms` instances that still honor the options of the `FieldInfo` as per my previous suggestion. E.g. you could add a new `Terms#empty(FieldInfo)` helper method that does the right thing for `hasFreqs()`, `hasPositions()`, etc. by looking at the index options of the `FieldInfo`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] shahrs87 commented on a diff in pull request #907: LUCENE-10357 Ghost fields and postings/points

Posted by GitBox <gi...@apache.org>.
shahrs87 commented on code in PR #907:
URL: https://github.com/apache/lucene/pull/907#discussion_r905458119


##########
lucene/core/src/java/org/apache/lucene/index/CheckIndex.java:
##########
@@ -1378,7 +1378,7 @@ private static Status.TermIndexStatus checkFields(
       computedFieldCount++;
 
       final Terms terms = fields.terms(field);
-      if (terms == null) {
+      if (terms == Terms.EMPTY) {

Review Comment:
   I tried to remove this if (terms == Terms.EMPTY) statement but many tests failed.
   Example of failing test: `org.apache.lucene.codecs.lucene90.TestLucene90NormsFormat#testUndeadNorms`
   Stack trace:
   ```
   > Task :lucene:core:test FAILED
   WARNING: A command line option has enabled the Security Manager
   WARNING: The Security Manager is deprecated and will be removed in a future release
   
   org.apache.lucene.codecs.lucene90.TestLucene90NormsFormat > testUndeadNorms FAILED
       org.apache.lucene.index.CheckIndex$CheckIndexException: field "content" should have hasFreqs=true but got false
           at __randomizedtesting.SeedInfo.seed([2A7308C15B316422:27A64CAC8EDF759E]:0)
           at app//org.apache.lucene.index.CheckIndex.checkFields(CheckIndex.java:1437)
           at app//org.apache.lucene.index.CheckIndex.testPostings(CheckIndex.java:2428)
           at app//org.apache.lucene.index.CheckIndex.testSegment(CheckIndex.java:999)
           at app//org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:714)
           at app//org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:552)
           at app//org.apache.lucene.tests.util.TestUtil.checkIndex(TestUtil.java:343)
           at app//org.apache.lucene.tests.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:909)
           at app//org.apache.lucene.tests.index.BaseNormsFormatTestCase.testUndeadNorms(BaseNormsFormatTestCase.java:698)
           at java.base@17.0.2/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at java.base@17.0.2/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
           at java.base@17.0.2/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.base@17.0.2/java.lang.reflect.Method.invoke(Method.java:568)
           at app//com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)
           at app//com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942)
           at app//com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978)
           at app//com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992)
           at app//org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)
           at app//org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
           at app//org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
           at app//org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
           at app//org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
           at app//org.junit.rules.RunRules.evaluate(RunRules.java:20)
   ```
   
   I think the problem is:
   Even though the `content` field is deleted, it is still present in fields returned by `reader#getPostingsReader` [here](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/CheckIndex.java#L2417) 
   Since the returned terms is EMPTY it has `hasFreqs` set to `false` but since this field exists in fields and the indexOptions set to `DOCS_AND_FREQS_AND_POSITIONS` it is expecting hasFreqs to true. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] shahrs87 commented on pull request #907: LUCENE-10357 Ghost fields and postings/points

Posted by GitBox <gi...@apache.org>.
shahrs87 commented on PR #907:
URL: https://github.com/apache/lucene/pull/907#issuecomment-1175446230

   @jpountz  There are no null or terms.EMPTY checks in CheckIndex class anymore. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] shahrs87 commented on pull request #907: LUCENE-10357 Ghost fields and postings/points

Posted by GitBox <gi...@apache.org>.
shahrs87 commented on PR #907:
URL: https://github.com/apache/lucene/pull/907#issuecomment-1282791420

   @jpountz  Can you please review this patch again? Thank you


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] shahrs87 commented on pull request #907: LUCENE-10357 Ghost fields and postings/points

Posted by GitBox <gi...@apache.org>.
shahrs87 commented on PR #907:
URL: https://github.com/apache/lucene/pull/907#issuecomment-1156823060

   @jpountz  Made one more attempt to fix test failures. Can you please take a look again ? Thank you fir being patient with me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] LUCENE-10357 Ghost fields and postings/points [lucene]

Posted by "mikemccand (via GitHub)" <gi...@apache.org>.
mikemccand commented on PR #907:
URL: https://github.com/apache/lucene/pull/907#issuecomment-1790591979

   Thank you for persisting so hard on this one @shahrs87 -- I'm sorry it looks like we should close it at this point, but your efforts / iterations were needed to see that we are mostly exchanging one `if` for another.
   
   The whole `null` vs `EMPTY` is such a challenge (for Lucene anyways).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] LUCENE-10357 Ghost fields and postings/points [lucene]

Posted by "mikemccand (via GitHub)" <gi...@apache.org>.
mikemccand closed pull request #907: LUCENE-10357 Ghost fields and postings/points
URL: https://github.com/apache/lucene/pull/907


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] shahrs87 commented on pull request #907: LUCENE-10357 Ghost fields and postings/points

Posted by GitBox <gi...@apache.org>.
shahrs87 commented on PR #907:
URL: https://github.com/apache/lucene/pull/907#issuecomment-1132206775

   Hi @jpountz,
   I have tried to put up a patch from the suggestions you have made in [LUCENE-10357](https://issues.apache.org/jira/browse/LUCENE-10357). Can you please review and provide feedback that I am going in the right direction ? Thank you.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] shahrs87 commented on pull request #907: LUCENE-10357 Ghost fields and postings/points

Posted by GitBox <gi...@apache.org>.
shahrs87 commented on PR #907:
URL: https://github.com/apache/lucene/pull/907#issuecomment-1180628206

   @jpountz  Hi Adrian, can you please make one more pass over the PR and provide your feedback ? Thank you.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] shahrs87 commented on a diff in pull request #907: LUCENE-10357 Ghost fields and postings/points

Posted by GitBox <gi...@apache.org>.
shahrs87 commented on code in PR #907:
URL: https://github.com/apache/lucene/pull/907#discussion_r896006974


##########
lucene/codecs/src/java/org/apache/lucene/codecs/bloom/BloomFilteringPostingsFormat.java:
##########
@@ -200,8 +200,8 @@ public Terms terms(String field) throws IOException {
         return delegateFieldsProducer.terms(field);
       } else {
         Terms result = delegateFieldsProducer.terms(field);
-        if (result == null) {
-          return null;
+        if (result == null || result == Terms.EMPTY) {

Review Comment:
   Yes, this test case is failing even with this patch: 
   `TestMemoryIndexAgainstDirectory#testRandomQueries` 
   Reproducible by: `gradlew :lucene:memory:test --tests "org.apache.lucene.index.memory.TestMemoryIndexAgainstDirectory.testRandomQueries" -Ptests.jvms=8 -Ptests.jvmargs=-XX:TieredStopAtLevel=1 -Ptests.seed=B19145C39C34BD03 -Ptests.gui=false -Ptests.file.encoding=UTF-8`
   
   The underlying reader it is using is MemoryIndex#MemoryIndexReader [here](https://github.com/apache/lucene/blob/main/lucene/memory/src/java/org/apache/lucene/index/memory/MemoryIndex.java#L1405)
   This is the relevant snippet.
   ```
           if (info == null || info.numTokens <= 0) {
             return null;
           }
   ```
   
   Below is the text I copied from LUCENE-10357 description.
   
   > I fear that this could be a source of bugs, as a caller could be tempted to assume that he would get non-null terms on a FieldInfo that has IndexOptions that are not NONE. Should we introduce a contract that FieldsProducer (resp. PointsReader) must return a non-null instance when postings (resp. points) are indexed?
   
   I don't know which all places I need to do null check ? From the above description, looks like only in FieldsProducer related classes. From my limited understanding, this doesn't look like FiledsProducer. @jpountz  please advise.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org