You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Adrien Grand (JIRA)" <ji...@apache.org> on 2015/12/04 19:09:11 UTC
[jira] [Created] (LUCENE-6919) Change the Scorer API to expose an
iterator instead of extending DocIdSetIterator
Adrien Grand created LUCENE-6919:
------------------------------------
Summary: Change the Scorer API to expose an iterator instead of extending DocIdSetIterator
Key: LUCENE-6919
URL: https://issues.apache.org/jira/browse/LUCENE-6919
Project: Lucene - Core
Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
I was working on trying to address the performance regression on LUCENE-6815 but this is hard to do without introducing specialization of DisjunctionScorer which I'd like to avoid at all costs.
I think the performance regression would be easy to address without specialization if Scorers were changed to return an iterator instead of extending DocIdSetIterator. So conceptually the API would move from
{code}
class Scorer extends DocIdSetIterator {
}
{code}
to
{code}
class Scorer {
DocIdSetIterator iterator();
}
{code}
This would help me because then if none of the sub clauses support two-phase iteration, DisjunctionScorer could directly return the approximation as an iterator instead of having to check if twoPhase == null at every iteration.
Such an approach could also help remove some method calls. For instance TermScorer.nextDoc calls PostingsEnum.nextDoc but with this change TermScorer.iterator() could return the PostingsEnum and TermScorer would not even appear in stack traces when scoring. I hacked a patch to see how much that would help and luceneutil seems to like the change:
{noformat}
TaskQPS baseline StdDev QPS patch StdDev Pct diff
Fuzzy1 88.54 (15.7%) 86.73 (16.6%) -2.0% ( -29% - 35%)
AndHighLow 698.98 (4.1%) 691.11 (5.1%) -1.1% ( -9% - 8%)
Fuzzy2 26.47 (11.2%) 26.28 (10.3%) -0.7% ( -19% - 23%)
MedSpanNear 141.03 (3.3%) 140.51 (3.2%) -0.4% ( -6% - 6%)
HighPhrase 60.66 (2.6%) 60.48 (3.3%) -0.3% ( -5% - 5%)
LowSpanNear 29.25 (2.4%) 29.21 (2.1%) -0.1% ( -4% - 4%)
MedPhrase 28.32 (1.9%) 28.28 (2.0%) -0.1% ( -3% - 3%)
LowPhrase 17.31 (2.1%) 17.29 (2.6%) -0.1% ( -4% - 4%)
HighSloppyPhrase 10.93 (6.0%) 10.92 (6.0%) -0.1% ( -11% - 12%)
MedSloppyPhrase 72.21 (2.2%) 72.27 (1.8%) 0.1% ( -3% - 4%)
Respell 57.35 (3.2%) 57.41 (3.4%) 0.1% ( -6% - 6%)
HighSpanNear 26.71 (3.0%) 26.75 (2.5%) 0.1% ( -5% - 5%)
OrNotHighLow 803.46 (3.4%) 807.03 (4.2%) 0.4% ( -6% - 8%)
LowSloppyPhrase 88.02 (3.4%) 88.77 (2.5%) 0.8% ( -4% - 7%)
OrNotHighMed 200.45 (2.7%) 203.83 (2.5%) 1.7% ( -3% - 7%)
OrHighHigh 38.98 (7.9%) 40.30 (6.6%) 3.4% ( -10% - 19%)
HighTerm 92.53 (5.3%) 95.94 (5.8%) 3.7% ( -7% - 15%)
OrHighMed 53.80 (7.7%) 55.79 (6.6%) 3.7% ( -9% - 19%)
AndHighMed 266.69 (1.7%) 277.15 (2.5%) 3.9% ( 0% - 8%)
Prefix3 44.68 (5.4%) 46.60 (7.0%) 4.3% ( -7% - 17%)
MedTerm 261.52 (4.9%) 273.52 (5.4%) 4.6% ( -5% - 15%)
Wildcard 42.39 (6.1%) 44.35 (7.8%) 4.6% ( -8% - 19%)
IntNRQ 10.46 (7.0%) 10.99 (9.5%) 5.0% ( -10% - 23%)
OrNotHighHigh 67.15 (4.6%) 70.65 (4.5%) 5.2% ( -3% - 15%)
OrHighNotHigh 43.07 (5.1%) 45.36 (5.4%) 5.3% ( -4% - 16%)
OrHighLow 64.19 (6.4%) 67.72 (5.5%) 5.5% ( -6% - 18%)
AndHighHigh 64.17 (2.3%) 67.87 (2.1%) 5.8% ( 1% - 10%)
LowTerm 642.94 (10.9%) 681.48 (8.5%) 6.0% ( -12% - 28%)
OrHighNotMed 12.68 (6.9%) 13.51 (6.6%) 6.5% ( -6% - 21%)
OrHighNotLow 54.69 (6.8%) 58.25 (7.0%) 6.5% ( -6% - 21%)
{noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org