You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@solr.apache.org by Wei <we...@gmail.com> on 2023/05/11 21:02:36 UTC

Question for customize index segment search order

Hi ,

We have a index that has multiple segments generated with continuous
updates.  There is always a large dominant segment after index rebuild,
then many small segments are generated with continuous updates.  At query
time we apply early termination with EarlyTerminatingCollector
https://github.com/apache/solr/blob/d9ddba3ac51ece953d762c796f62730e27629966/solr/core/src/java/org/apache/solr/search/EarlyTerminatingCollector.java
,
which triggers EarlyTerminatingCollectorException in SolrIndexSearcher
https://github.com/apache/solr/blob/d9ddba3ac51ece953d762c796f62730e27629966/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L281
.
We see a problem that the limit can be reached within the dominant segment
alone (seems it is always traversed first) while documents with recent
updates in the newer segments doesn't get a chance to be scored.  Is it
possible to customize the segment visiting order in Solr so that the latest
generated segments are searched first?  Any suggestion is appreciated.

Thanks,
Wei

Re: Question for customize index segment search order

Posted by Mikhail Khludnev <mk...@apache.org>.
Hello, Wei.
Pardon for pinging you back to the Lucene field.
Here's the loop over segments
https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L674
So, presumably:
 - custom searcher may loop segments out of order
 - custom wrapper over index reader may yield list of child contexts in
reverse order
 - some code around NTR commit may put recent segments in the beginning.
I'm not aware of any of these^ implementations, but it should be something
which is needed often.

On Fri, May 12, 2023 at 12:03 AM Wei <we...@gmail.com> wrote:

> Hi ,
>
> We have a index that has multiple segments generated with continuous
> updates.  There is always a large dominant segment after index rebuild,
> then many small segments are generated with continuous updates.  At query
> time we apply early termination with EarlyTerminatingCollector
>
> https://github.com/apache/solr/blob/d9ddba3ac51ece953d762c796f62730e27629966/solr/core/src/java/org/apache/solr/search/EarlyTerminatingCollector.java
> ,
> which triggers EarlyTerminatingCollectorException in SolrIndexSearcher
>
> https://github.com/apache/solr/blob/d9ddba3ac51ece953d762c796f62730e27629966/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L281
> .
> We see a problem that the limit can be reached within the dominant segment
> alone (seems it is always traversed first) while documents with recent
> updates in the newer segments doesn't get a chance to be scored.  Is it
> possible to customize the segment visiting order in Solr so that the latest
> generated segments are searched first?  Any suggestion is appreciated.
>
> Thanks,
> Wei
>


-- 
Sincerely yours
Mikhail Khludnev
https://t.me/MUST_SEARCH
A caveat: Cyrillic!

Re: Question for customize index segment search order

Posted by Mikhail Khludnev <mk...@apache.org>.
Hello, Wei.
Pardon for pinging you back to the Lucene field.
Here's the loop over segments
https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L674
So, presumably:
 - custom searcher may loop segments out of order
 - custom wrapper over index reader may yield list of child contexts in
reverse order
 - some code around NTR commit may put recent segments in the beginning.
I'm not aware of any of these^ implementations, but it should be something
which is needed often.

On Fri, May 12, 2023 at 12:03 AM Wei <we...@gmail.com> wrote:

> Hi ,
>
> We have a index that has multiple segments generated with continuous
> updates.  There is always a large dominant segment after index rebuild,
> then many small segments are generated with continuous updates.  At query
> time we apply early termination with EarlyTerminatingCollector
>
> https://github.com/apache/solr/blob/d9ddba3ac51ece953d762c796f62730e27629966/solr/core/src/java/org/apache/solr/search/EarlyTerminatingCollector.java
> ,
> which triggers EarlyTerminatingCollectorException in SolrIndexSearcher
>
> https://github.com/apache/solr/blob/d9ddba3ac51ece953d762c796f62730e27629966/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L281
> .
> We see a problem that the limit can be reached within the dominant segment
> alone (seems it is always traversed first) while documents with recent
> updates in the newer segments doesn't get a chance to be scored.  Is it
> possible to customize the segment visiting order in Solr so that the latest
> generated segments are searched first?  Any suggestion is appreciated.
>
> Thanks,
> Wei
>


-- 
Sincerely yours
Mikhail Khludnev
https://t.me/MUST_SEARCH
A caveat: Cyrillic!