You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2020/03/26 01:17:56 UTC

[GitHub] [lucene-solr] mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip noncompetitive documents

mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip noncompetitive documents
URL: https://github.com/apache/lucene-solr/pull/1351#issuecomment-604173071
 
 
   I have run some benchmarking using `luceneutil`.
   As the new sort optimization uses a new `LongDocValuesPointSortField` that is not present in `luceneutil`, I had to hack `luceneutil` as follows:
   
   1. I added a  sort task on a long field `TermDateTimeSort`  to `wikimedium.1M.nostopwords.tasks` . This task was present in `wikinightly.tasks` , but was not able for wikimedium 1M and 10M tasks
   2. I indexed the corresponding field `lastModNDV` as `LongPoint` as well. It was only indexed as `NumericDocValuesField` before, but for the sort optimization we need long values to be indexed both as docValues and as points.
   3. I modified `SearchTask.java` to have `TopFieldCollector` with `totalHitsThreshold` set to `topK`: `final TopFieldCollector c = TopFieldCollector.create(s, topN, null, topN);`   Sort optimization only works when we set total hits threshold.
   4. For the patch version , I modified sort in `TaskParser.java`. Instead of `lastModNDVSort = new Sort(new SortField("lastModNDV", SortField.Type.LONG));`  I useed the optimized sort: `lastModNDVSort = new Sort(new LongDocValuesPointSortField("lastModNDV"));`
   
   Here the main point of comparison is `TermDTSort` as it is the only sort on long field. Other sorts are presented to demonstrate a possible regression or absence on them.
   
   ---
   wikimedium1m
   | TaskQPS               | baseline QPS | StdDevQPS | my_modified_version QPS | StdDevQPS |
   | --------------------- | -----------: | --------: | ----------------------: | --------: |
   | **TermDTSort**        |       507.20 |   (11.2%) |                  550.02 |   (16.1%) |
   | HighTermMonthSort     |       550.06 |   (10.4%) |                  443.69 |   (16.1%) |
   | HighTermDayOfYearSort |       105.62 |   (24.9%) |                   91.93 |   (22.1%) |
   ---
   wikimedium10m
   | TaskQPS               | baseline QPS | StdDevQPS | my_modified_version QPS | StdDevQPS |
   | --------------------- | -----------: | --------: | ----------------------: | --------: |
   | **TermDTSort**        |       147.64 |   (11.5%) |                  547.80 |    (6.6%) |
   | HighTermMonthSort     |       147.85 |   (12.2%) |                  239.28 |    (7.3%) |
   | HighTermDayOfYearSort |        74.44 |    (7.7%) |                   42.56 |   (12.1%) |
   
   For wikimedium1m using `LongDocValuesPointSortField` doesn't seem to have much effect. As probably in this index segments are smaller, and probably optimization was completely skipped on those segments.
   For wikimedium10m using `LongDocValuesPointSortField`  instead of usual `SortField.Type.LONG` **brings about 3x speedups**.
   There is so regression/speedups for the sort tasks of HighTermMonthSort and HighTermDayOfYearSort, which I don't know the reason why, as they should not be effected. 
   
   
   
   
   
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org