You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2023/01/12 19:12:20 UTC

[GitHub] [lucene] Brain2000 opened a new issue, #12082: LeafFieldComparator setBottom not being called before compareBottom

Brain2000 opened a new issue, #12082:
URL: https://github.com/apache/lucene/issues/12082

   ### Description
   
   It looks like there's a problem in the TopFieldCollector.java where it calls "compareBottom" without calling "setBottom" first.
   I believe this is an issue if getLeafComparer( ) is called more than once, as the new class will no longer have the previous bottom slot value.
   
   I found a workaround is to rescope the variable storing the bottom slot set from calling "setBottom( )", to outside of the LeafFieldComparator class to make it persistent (just like the setTopValue( ) is called outside of the LeafFieldComparator class).
   
   Rescoping the bottom slot should be safe unless multiple threads are being run at the same time and need their own version of bottomSlot for the LeafFieldComparator class.
   
   The documentation here 
     https://lucene.apache.org/core/8_0_0/core/org/apache/lucene/search/LeafFieldComparator.html
   states that _"[setBottom] will always be called before compareBottom(int)"_
   
   So one might think the LeafFieldComparator would be used like this:
   
   ```
   //get new leaf comparator
   LeafFieldComparator lc = getLeafComparator( );
   //setBottom before compareBottom
   lc.setBottom(xxx); 
   lc.compareBottom(yyy);
   
   //get new leaf comparator
   lc = getLeafComparator( );
   //setBottom before compareBottom
   lc.setBottom(xxx2);
   lc.compareBottom(yyy2);
   ```
   
   However, it is being called in this fashion:
   
   ```
   //get new leaf comparator
   LeafFieldComparator lc = getLeafComparator( );
   //setBottom before compareBottom
   lc.setBottom(xxx); 
   lc.compareBottom(yyy);
   
   //get new leaf comparator
   LeafFieldComparator lc = getLeafComparator( );
   //compareBottom... hopefully the slot is correct...
   lc.compareBottom(yyy);
   ```
   
   As you can see, the 2nd set of code does not treat the LeafComparator as if each instance has it's own bottomSlot variable. I believe the purpose of this would be to allow multiple threads to call this simultaneously, which would require each one to have it's own bottomSlot, otherwise it would not be thread safe.
   
   Also, if setBottom was meant to be outside of the scope of LeafComparator, then why was the time taken to put setTop outside and setBottom inside? This seems to have been done very deliberately according to the documentation.
   
   Here is a sample code snippet:
   ```
   import org.apache.lucene.index.LeafReaderContext;
   import org.apache.lucene.search.FieldComparator;
   import org.apache.lucene.search.LeafFieldComparator;
   import org.apache.lucene.search.Scorable;
   
   public final class MySimpleStringComparator extends FieldComparator<String> {
   	private String[] values;
   	private final String field;
   	private String topValue;
   
   	//SCOPING bottomSlot HERE WILL CAUSE ITEMS TO RETURN PROPERLY
   	//private int bottomSlot;
   
   	public MySimpleStringComparator(int numHits, String field) {
   		this.values = new String[numHits];
   		this.field = field;
   	}
   
   	@Override
   	public int compare(int slot1, int slot2) {
   		return this.compareValues(this.values[slot1], this.values[slot2]);
   	}
   
   	@Override
   	public LeafFieldComparator getLeafComparator(LeafReaderContext lrc) throws IOException {
   		final LeafReaderContext frc = lrc;
   
   		return new LeafFieldComparator() {
   			//SCOPING bottomSlot HERE WILL CAUSE ITEMS ON A SUBSET TO RANDOMLY DISAPPEAR
   			private int bottomSlot;
   
   			public void setScorer(Scorable arg0) throws IOException {
   			}
   
   			public void setBottom(int slot) throws IOException {
   				bottomSlot = slot;
   			}
   
   			public void copy(int slot, int doc) throws IOException {
   				values[slot] = frc.reader().document(doc).get(field);
   			}
   
   			public int compareTop(int doc) throws IOException {
   				return compareValues(topValue, frc.reader().document(doc).get(field));
   			}
   
   			public int compareBottom(int doc) throws IOException {
   				return compareValues(values[bottomSlot], frc.reader().document(doc).get(field));
   			}
   		};
   	}
   
   	@Override
   	public void setTopValue(String val) {
   		this.topValue = val;
   	}
   
   	@Override
   	public String value(int slot) {
   		return this.values[slot];
   	}
   }
   ```
   
   Here is a code snippet that runs a search/sort for a single page that utilizes the MySimpleStringComparator above.
   ```
   ComplexPhraseQueryParser cpqp = new ComplexPhraseQueryParser("somefield", analyzer);
   Query query = cpqp.parse("somevalue");
   
   pageSize = 10;
   pageNum = 1;
   requestedRecords = pageSize * pageNum;
   startOffset = (pageNum - 1) * pageSize;
   
   FieldComparatorSource fsc = new FieldComparatorSource() {
   	@Override
   	public FieldComparator<String> newComparator(String fieldname, int numhits, int sortPos, boolean reversed) {
   		return new MySimpleStringComparator(numhits, fieldname);
   	}
   };
   
   Sort sort = new Sort(new SortField("firstname", fsc, false));
   IndexSearcher searcher = new IndexSearcher(reader);
   TopFieldCollector tfcollector = TopFieldCollector.create(sort, requestedRecords, Integer.MAX_VALUE);
   searcher.search(query, tfcollector);
   ScoreDoc[] hits = tfcollector.topDocs(startOffset, pageSize).scoreDocs;
   ```
   
   ### Version and environment details
   
   This is running on Windows Server 2019 using Tomcat 8.5 with Lucene 8.11.2 (also reproduceable in 8.6.0)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on issue #12082: LeafFieldComparator setBottom not being called before compareBottom

Posted by "jpountz (via GitHub)" <gi...@apache.org>.
jpountz commented on issue #12082:
URL: https://github.com/apache/lucene/issues/12082#issuecomment-1404967722

   It's been a long time, but I would guess that `setBottom` was put on `LeafFieldComparator` instead of `FieldComparator` because it is always called from a `LeafCollector` rather than a `Collector` (in contrast to e.g. `setTop`). It could make sense to move it to `Comparator`, but this would force `LeafCollectors` produced by top-field comparators to keep track of both the `FieldComparator` and the `LeafFieldComparator`. In the end I don't have a strong opinion on this question, there seem to be pros/cons with both having `setBottom` on `Comparator` and `LeafFieldComparator`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] vigyasharma commented on issue #12082: LeafFieldComparator setBottom not being called before compareBottom

Posted by GitBox <gi...@apache.org>.
vigyasharma commented on issue #12082:
URL: https://github.com/apache/lucene/issues/12082#issuecomment-1396549638

   I think you're right that `bottom` should be scoped outside the `LeafFieldComparator`. It stores the bottom slot value for competitive hits and should survive across leaf contexts.
   
   I checked a few FieldComparator implementations however, and I do see  it scoped outside the LeafFieldComparator. For e.g. [DoubleComparator](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/comparators/DoubleComparator.java#L32), and [DocComparator](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/comparators/DocComparator.java#L31)
   
   This also seems to be the case in Lucene 8.11.2 ([[1]](https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/core/src/java/org/apache/lucene/search/comparators/DoubleComparator.java#L33), [[2]](https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/core/src/java/org/apache/lucene/search/comparators/DocComparator.java#L34))
   
   Can you share code references/links for some comparators where you see this is an issue? Or perhaps a test which reproduces this issue?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org