You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/12/21 08:15:18 UTC
[GitHub] [lucene] gf2121 opened a new issue, #12028: Add newSetQuery for IntField, LongField, FloatField, DoubleField
gf2121 opened a new issue, #12028:
URL: https://github.com/apache/lucene/issues/12028
### Description
Today `TermInSetQuery` can be rewritten to disjunction BooleanQuery to lazily materialize query result if terms count < 16. This can significantly improve query performance in cases like `selective_clause AND low_cardinality_field in (xxx) `.
Recently we added IntField, LongField, FloatField, DoubleField to index both with points and doc values (https://github.com/apache/lucene/issues/11199). `xxxField#newExactQuery` now can take advantage of `IndexOrDocValuesQuery` to match with DocValues when there is a selective conjunction clause. I wonder if we can have `xxxField#newSetQuery` that generates disjunction BooleanQuery when points count < 16 ?
For example:
```
public static Query newSetQuery(String field, long... values) {
if (values.length < 16) {
BooleanQuery.Builder builder = new BooleanQuery.Builder();
for (long value: values) {
builder.add(newExactQuery(field, value), Occur.FILTER);
}
return builder.build();
}
return LongPoint.newSetQuery(field, values);
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] gf2121 commented on issue #12028: Add newSetQuery for IntField, LongField, FloatField, DoubleField
Posted by GitBox <gi...@apache.org>.
gf2121 commented on issue #12028:
URL: https://github.com/apache/lucene/issues/12028#issuecomment-1361021593
I benchmarked some queries like `_id = '1' AND cardinality_8_field in (1, 2, 3) ` on 1M docs, here is the result:
```
Benchmark Mode Cnt Score Error Units
fieldSetQuery thrpt 10 48.025 ± 16.741 ops/ms
pointSetQuery thrpt 10 5.514 ± 0.159 ops/ms
```
`fieldSetQuery` is using `LongField#newSetQuery` (see the example above) while `pointSetQuery` is using `LongPoint#newSetQuery`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] gf2121 closed issue #12028: Add newSetQuery for IntField, LongField, FloatField, DoubleField
Posted by "gf2121 (via GitHub)" <gi...@apache.org>.
gf2121 closed issue #12028: Add newSetQuery for IntField, LongField, FloatField, DoubleField
URL: https://github.com/apache/lucene/issues/12028
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] rmuir commented on issue #12028: Add newSetQuery for IntField, LongField, FloatField, DoubleField
Posted by GitBox <gi...@apache.org>.
rmuir commented on issue #12028:
URL: https://github.com/apache/lucene/issues/12028#issuecomment-1382953573
I don't think it is good to degrade to `BooleanQuery` when using points or doc-values, it will only hurt performance.
Let's add `NumericDocValuesField.newSlowSetQuery()` and `SortedNumericDocValuesField.newSlowSetQuery()` to complement the doc-values based range queries?
Query in fact already exist, but needs to be cleaned up since they have been "hiding" in `lucene/sandbox`. See PR.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] javanna commented on issue #12028: Add newSetQuery for IntField, LongField, FloatField, DoubleField
Posted by "javanna (via GitHub)" <gi...@apache.org>.
javanna commented on issue #12028:
URL: https://github.com/apache/lucene/issues/12028#issuecomment-1411142472
Looks like this issue is addressed with the PR above? Can we close it or is there anything left to do that I am missing?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] gf2121 commented on issue #12028: Add newSetQuery for IntField, LongField, FloatField, DoubleField
Posted by "gf2121 (via GitHub)" <gi...@apache.org>.
gf2121 commented on issue #12028:
URL: https://github.com/apache/lucene/issues/12028#issuecomment-1436240116
Thanks Robert, nice approach!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org