You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Weiming Wu (Jira)" <ji...@apache.org> on 2022/06/21 06:17:00 UTC
[jira] [Comment Edited] (LUCENE-10624) Binary Search for Sparse IndexedDISI advanceWithinBlock & advanceExactWithinBlock
[ https://issues.apache.org/jira/browse/LUCENE-10624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17556662#comment-17556662 ]
Weiming Wu edited comment on LUCENE-10624 at 6/21/22 6:16 AM:
--------------------------------------------------------------
Added benchmark data to the description.
was (Author: JIRAUSER290435):
Added benchmark data to the content.
> Binary Search for Sparse IndexedDISI advanceWithinBlock & advanceExactWithinBlock
> ---------------------------------------------------------------------------------
>
> Key: LUCENE-10624
> URL: https://issues.apache.org/jira/browse/LUCENE-10624
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/codecs
> Affects Versions: 9.0, 9.1, 9.2
> Reporter: Weiming Wu
> Priority: Major
> Attachments: baseline_sparseTaxis_searchsparse-sorted.0.log, candidate_sparseTaxis_searchsparse-sorted.0.log
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> h3. Problem Statement
> We noticed DocValue read performance regression with the iterative API when upgrading from Lucene 5 to Lucene 9. Our latency is increased by 50%. The degradation is similar to what's described in https://issues.apache.org/jira/browse/SOLR-9599
> By analyzing profiling data, we found method "advanceWithinBlock" and "advanceExactWithinBlock" for Sparse IndexedDISI is slow in Lucene 9 due to their O(N) doc lookup algorithm.
> h3. Changes
> Used binary search algorithm to replace current O(N) lookup algorithm in Sparse IndexedDISI "advanceWithinBlock" and "advanceExactWithinBlock" because docs are in ascending order.
> h3. Test
> {code:java}
> ./gradlew tidy
> ./gradlew check {code}
> h3. Benchmark
> Ran sparseTaxis test cases from {color:#1d1c1d}luceneutil. Attached the reports of baseline and candidates in attachments section.{color}
> {color:#1d1c1d}1. Most cases have 5-10% search latency reduction.{color}
> {color:#1d1c1d}2. Some highlights (>20%):{color}
> * *{color:#1d1c1d}T0 green_pickup_latitude:[40.75 TO 40.9] yellow_pickup_latitude:[40.75 TO 40.9] sort=null{color}*
> ** {color:#1d1c1d}*Baseline:* 10973978+ hits hits in *726.81967 msec*{color}
> ** {color:#1d1c1d}*Candidate:* 10973978+ hits hits in *484.544594 msec*{color}
> * *{color:#1d1c1d}T0 cab_color:y cab_color:g sort=null{color}*
> ** {color:#1d1c1d}*Baseline:* 2300174+ hits hits in *95.698324 msec*{color}
> ** {color:#1d1c1d}*Candidate:* 2300174+ hits hits in *78.336193 msec*{color}
> * {color:#1d1c1d}*T1 cab_color:y cab_color:g sort=null*{color}
> ** {color:#1d1c1d}*Baseline:* 2300174+ hits hits in *391.565239 msec*{color}
> ** {color:#1d1c1d}*Candidate:* 300174+ hits hits in *227.592885 msec*{color}{*}{*}
> * {color:#1d1c1d}*...*{color}
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org