You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Adrien Grand (JIRA)" <ji...@apache.org> on 2015/07/03 14:14:04 UTC
[jira] [Updated] (LUCENE-6645) BKD tree queries should use
BitDocIdSet.Builder
[ https://issues.apache.org/jira/browse/LUCENE-6645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Adrien Grand updated LUCENE-6645:
---------------------------------
Attachment: LUCENE-6645.patch
I played a bit with the benchmark and have similar results (1.76 sec for trunk and more than 4 sec with the patch). It's a worst case for BitDocIdSetBuilder given that it always starts to build a SparseFixedBitSet to eventually upgrade it to a FixedBitSet. But still it's disappointing that it's so slow compared to building a FixedBitSet directly.
I've experimented with a more brute-force approach (see attached patch) that uses a plain int[] instead of a SparseFixedBitSet for the sparse case, and it seems to perform better: the benchmark runs in 1.76 sec on trunk and 2.70 sec with the patch if the builder is configured to use an int[] up to number of docs of maxDoc / 128. It goes down to 1.96 with a threshold of maxDoc / 2048. Maybe this is what we should use instead of BitDocIdSetBuilder?
I tried to see how this affects our luceneutil benchmark and there is barely any change:
{noformat}
TaskQPS baseline StdDev QPS patch StdDev Pct diff
Fuzzy1 74.41 (18.3%) 69.59 (19.4%) -6.5% ( -37% - 38%)
LowTerm 761.39 (2.4%) 749.20 (3.6%) -1.6% ( -7% - 4%)
OrNotHighLow 877.81 (2.2%) 867.60 (5.3%) -1.2% ( -8% - 6%)
OrHighNotMed 76.63 (2.1%) 75.89 (2.7%) -1.0% ( -5% - 3%)
MedTerm 309.75 (1.3%) 306.86 (2.6%) -0.9% ( -4% - 2%)
OrHighHigh 26.86 (5.4%) 26.64 (3.3%) -0.8% ( -9% - 8%)
OrNotHighHigh 67.94 (1.0%) 67.42 (2.0%) -0.8% ( -3% - 2%)
HighTerm 132.28 (1.4%) 131.29 (1.7%) -0.7% ( -3% - 2%)
Respell 78.71 (2.8%) 78.14 (3.2%) -0.7% ( -6% - 5%)
LowPhrase 121.23 (0.8%) 120.47 (1.3%) -0.6% ( -2% - 1%)
OrHighNotLow 112.94 (2.3%) 112.25 (2.5%) -0.6% ( -5% - 4%)
OrNotHighMed 223.81 (2.4%) 222.52 (3.8%) -0.6% ( -6% - 5%)
OrHighLow 71.79 (4.3%) 71.39 (3.3%) -0.6% ( -7% - 7%)
MedSpanNear 23.33 (1.1%) 23.21 (1.8%) -0.5% ( -3% - 2%)
AndHighHigh 62.01 (1.9%) 61.71 (3.6%) -0.5% ( -5% - 5%)
OrHighMed 41.79 (5.5%) 41.61 (3.6%) -0.4% ( -9% - 9%)
AndHighMed 90.86 (2.0%) 90.61 (2.8%) -0.3% ( -5% - 4%)
HighSloppyPhrase 47.43 (4.6%) 47.33 (4.8%) -0.2% ( -9% - 9%)
HighPhrase 28.36 (1.6%) 28.30 (1.3%) -0.2% ( -3% - 2%)
MedPhrase 147.25 (1.4%) 146.99 (1.6%) -0.2% ( -3% - 2%)
LowSloppyPhrase 37.07 (2.2%) 37.03 (2.3%) -0.1% ( -4% - 4%)
MedSloppyPhrase 156.95 (3.7%) 156.80 (3.6%) -0.1% ( -7% - 7%)
LowSpanNear 29.05 (2.2%) 29.02 (2.0%) -0.1% ( -4% - 4%)
OrHighNotHigh 61.13 (1.5%) 61.08 (1.6%) -0.1% ( -3% - 3%)
HighSpanNear 15.36 (1.7%) 15.36 (1.8%) 0.0% ( -3% - 3%)
Wildcard 111.57 (3.1%) 113.05 (2.1%) 1.3% ( -3% - 6%)
IntNRQ 7.49 (7.3%) 7.60 (5.2%) 1.4% ( -10% - 14%)
Prefix3 72.81 (4.6%) 74.18 (4.1%) 1.9% ( -6% - 11%)
AndHighLow 974.36 (3.0%) 994.46 (2.9%) 2.1% ( -3% - 8%)
Fuzzy2 47.42 (16.1%) 53.71 (16.5%) 13.3% ( -16% - 54%)
{noformat}
I suspect this is because our multi-term queries in this benchmark match some high-frequency terms so the upgrade to a FixedBitSet happens quickly anyway.
> BKD tree queries should use BitDocIdSet.Builder
> -----------------------------------------------
>
> Key: LUCENE-6645
> URL: https://issues.apache.org/jira/browse/LUCENE-6645
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael McCandless
> Attachments: LUCENE-6645.patch, LUCENE-6645.patch
>
>
> When I was iterating on BKD tree originally I remember trying to use this builder (which makes a sparse bit set at first and then upgrades to dense if enough bits get set) and being disappointed with its performance.
> I wound up just making a FixedBitSet every time, but this is obviously wasteful for small queries.
> It could be the perf was poor because I was always .or'ing in DISIs that had 512 - 1024 hits each time (the size of each leaf cell in the BKD tree)? I also had to make my own DISI wrapper around each leaf cell... maybe that was the source of the slowness, not sure.
> I also sort of wondered whether the SmallDocSet in spatial module (backed by a SentinelIntSet) might be faster ... though it'd need to be sorted in the and after building before returning to Lucene.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org