You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Jack Krupansky (JIRA)" <ji...@apache.org> on 2013/06/09 23:54:20 UTC
[jira] [Commented] (LUCENE-5049) Native (C++) implementation of
"pure OR" BooleanQuery
[ https://issues.apache.org/jira/browse/LUCENE-5049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13679182#comment-13679182 ]
Jack Krupansky commented on LUCENE-5049:
----------------------------------------
Okay, I'll be the first to ask it: C++? Really? Is this the beginning of the end for Java for the world of high-performance search?
Seriously, a second question: What about alternative JVM-based languages? I mean, maybe Java does have excess baggage related to its quirky semantics, but could the raw JVM support a lower-level implementation of BQ, without leaving the JVM... "bubble"? OTOH, maybe different JVM's could have different performance characteristics.
Oh, and what compiler/machine architecture was this for?
Another question: might there be alternative representations of BQ based on what exactly the clauses are?
OTOH, for us Solr guys, there is somewhat the impression that raw Lucene search is blazing fast already and not the bottleneck for Solr where other things, like caches and facets and highlighting are the concern.
Finally, some of these gains seem... marginal if not outright disappointing considering the raw expectation that bare C++ should be a LOT faster. So, is this maybe more of a "See, C++ doesn't have THAT big an advantage over Java even for core search operations?
> Native (C++) implementation of "pure OR" BooleanQuery
> -----------------------------------------------------
>
> Key: LUCENE-5049
> URL: https://issues.apache.org/jira/browse/LUCENE-5049
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Attachments: LUCENE-5049.patch
>
>
> I've been playing with a C++ implementation of BooleanQuery containing
> only OR'd (SHOULD) TermQuery clauses, collecting top N hits by score.
> The results are impressive: ~3X speedup for BQ OR over two terms, and
> also good speedups (~38-78%) for Fuzzy1/2 as well since they rewrite
> to BQ OR over N terms:
> {noformat}
> Task QPS base StdDev QPS comp StdDev Pct diff
> MedTerm 69.47 (15.8%) 68.61 (13.4%) -1.2% ( -26% - 33%)
> HighTerm 55.25 (16.2%) 54.63 (13.9%) -1.1% ( -26% - 34%)
> LowTerm 333.10 (9.6%) 329.43 (8.0%) -1.1% ( -17% - 18%)
> IntNRQ 3.37 (2.6%) 3.36 (4.6%) -0.2% ( -7% - 7%)
> Prefix3 18.91 (2.0%) 19.04 (3.5%) 0.7% ( -4% - 6%)
> Wildcard 29.40 (1.7%) 29.70 (2.8%) 1.0% ( -3% - 5%)
> MedPhrase 132.69 (6.2%) 134.66 (7.0%) 1.5% ( -11% - 15%)
> HighSloppyPhrase 0.82 (3.6%) 0.83 (3.5%) 1.9% ( -5% - 9%)
> AndHighHigh 19.65 (0.6%) 20.02 (0.8%) 1.9% ( 0% - 3%)
> HighPhrase 11.74 (6.6%) 11.96 (7.1%) 1.9% ( -11% - 16%)
> MedSloppyPhrase 29.09 (1.2%) 29.76 (1.9%) 2.3% ( 0% - 5%)
> LowSloppyPhrase 25.71 (1.4%) 26.98 (1.7%) 4.9% ( 1% - 8%)
> Respell 173.78 (3.0%) 182.41 (3.7%) 5.0% ( -1% - 12%)
> MedSpanNear 27.67 (2.5%) 29.07 (2.4%) 5.1% ( 0% - 10%)
> HighSpanNear 2.95 (2.4%) 3.10 (2.8%) 5.4% ( 0% - 10%)
> LowSpanNear 8.29 (3.4%) 8.82 (3.3%) 6.4% ( 0% - 13%)
> AndHighMed 79.32 (1.6%) 84.44 (1.0%) 6.5% ( 3% - 9%)
> LowPhrase 23.20 (2.0%) 25.14 (1.6%) 8.4% ( 4% - 12%)
> AndHighLow 594.17 (3.4%) 660.32 (1.9%) 11.1% ( 5% - 16%)
> Fuzzy2 88.32 (6.4%) 121.44 (1.7%) 37.5% ( 27% - 48%)
> Fuzzy1 86.34 (6.0%) 153.49 (1.7%) 77.8% ( 66% - 90%)
> OrHighHigh 16.29 (2.5%) 48.29 (1.3%) 196.5% ( 188% - 205%)
> OrHighMed 28.98 (2.7%) 87.81 (0.9%) 203.0% ( 194% - 212%)
> OrHighLow 27.38 (2.6%) 84.94 (1.1%) 210.3% ( 201% - 219%)
> {noformat}
> This is essentially a scaled back attempt at LUCENE-1594 in that it's
> "hardwired" to "just" the "OR of TermQuery" case.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org